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ASYMPTOTICALLY MOST POWERFUL TESTS OF STATISTICAL 

HYPOTHESES 1 

By Abraham Wald* 

Columbia University , New York City 

1. Introduction. Let f(x, 8) be the probability density function of a variate 
x involving an unknown parameter $. For testing the hypothesis 8 — 8 0 by 
means of n independent observations , •• • , x n on x we have to choose a region 
of rejection W n in the n-dimensional sample space. Denote by P(W n | 0) the 
probability that the sample point E = (x \, • • • , x n ) will fall in W n under the 
assumption that 6 is the true value of the parameter. For any region U n of 
the n-dimensional sample space denote by g(U n ) the greatest lower bound of 
P(U n | 0). For any pair of regions U H and T n denote by L(U n , T n ) the least 
upper bound of 

P(U n | 8) - P(T n | 8). 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

Definition 1. A sequence \Wn\ } (n = 1, 2, • • • , ad inf.), of regions is said to 
be an asymptotically most powerful test of the hypothesis 8 — 8 0 on the level of 
significance a if P(W n | 0 O ) = ot and if for any sequence \Z n \ of regions for 

which P(Z n | 0 o ) — a, the inequality 

* 

lim sup L(Z„, TP„) < 0 

n —*«o 

holds. 

Definition 2. A sequence [W n } f (n = 1, 2, • , ad inf.), of regions is said 

to be an asymptotically most powerful unbiased test of the hypothesis 8 =» 0o 
on the level of significance a if P(TF n | 0o) = lim g(W n ) == a, and if for any se- 

n—w 

quence \Z n ) of regions for which P{Z n | 0 O ) = lim g(Z n ) = a , the inequality 

n— qo 

lim sup L(Z k , W H ) < 0 

n-*«o 

holds. 

Let 6 n (x i, • • • , x n ) be the maximum likelihood estimate of 8 in the n-dimen- 
signal sample st>ace. That is to say, 6 n (xi , • • • , x n ) denotes the value of 8 


1 Presented to the American Mathematical Society at New York, February 24, 1940, 
* Research under a grant-in-aid from the Carnegie Corporation of New York. 
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for which the product JJ f(x v , 0) becomes a maximum. Let Wh be the region 

r—1 

defined by the inequality \Zn(d n — 0o) > c' n , W” defined by the inequality 
y/n(Q n — 0 O ) < c'nj and let W n consists of all points for which at least one of 
the inequalities 

Vn( 6 n - Oo) > a n , y/n(bn - do) ^ - o n 

is satisfied. The constants a„ , c n , c” are chosen such that 

P(W: I do) = P(W: I do) - P(TFn | «o) =«. 

It will be shown in this paper that under certain restrictions on the probability 
density /Or, 0) the sequence { W ' n ) is an asymptotically most powerful test of the 
hypothesis 0 = do if 0 takes only values 0 > 0o. Similarly \ W"\ is an asymp¬ 
totically most powerful test if 0 takes only values 0 < 0 O . Finally jTF n ( is an 
asymptotically most powerful unbiased test if 0 can take any real value. 

2. Assumptions on the density function f(x, 0). 

Assumption 1. For any positive k 

lim JP(— k < 6 n — 0 < k 10) = 1 

n—oo 

uniformly in 0, where P( — k<& n — d<k\ 0 ) denotes the probability that —k < 
& n — 0 < k under the assumption that d is the true value of the parameter. 

Assumption 1 implies somewhat more than consistency of the maximum like¬ 
lihood estimate 6 n . In fact, consistency means only that for any positive k 

lim P( — k < & n — d < k\d) — 1, 

without asking that the convergence should be uniform in 0. If i) n satisfies 
Assumption 1 we shall say that is a uniformly consistent estimate of 0. A 
rigorous proof of the consistency of 0» (under certain restrictions on /(x, 0)) 
was given by J. L. Doob. 3 In an appendix to this paper it will be shown that 
under certain conditions 0 n is uniformly consistent. * 

Denote by Ee[^(x)] the expected value of \p(x) under the assumption that 0 
is the true value of the parameter. That is to say, A 

Eeltix)] = f f(x)f(x, d) dx. 

J—CO 

For any x, for any positive 5, and for any di , denote by <pi(x, 81 , 8 ) the greatest 

lower bound, and by <p 2 (x, 6 i, 8 ) the least upper bound of in the 

ddr 

interval 0i — 6 < 0 < 0i + 6. 

Assumption 2. There exists a positive value ko such that *ths expectatidks 
Ee<pi(x, 0i, 5) and E$<p 2 (x, di , 6 ) exist and are continuous functions of 0, 0i and b 


* J. L. Doob, “Probability and statistics/’ Trans. Am. Math. Soc. t Vol. 36 (1937). 
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in the domain D defined by the inequalities : 0 < B < Jfco, ft> — ih < Oi < 
0o + , 0o ~ ko < 0 < So + ko. Furthermore the expectations E$[<pi(x, 0i , 5)]* 

and Ee[(Pi(x, 0 \, B)] 2 exist in D and have a finite upper bound in D . 

Assumption 3. There exists a positive value ko such that 

far 0o-fco <0<0o + *o. 

oo w oO* 

Assumption 3 mean^simply that we may differentiate with respect to 6 under 
the integral sign. In fact 

[ f(x, 0) dx = 1 
J—to 

identically in 6. Hence 

l» Lj (x - 

Differentiating under the integral sign, we obtain the relations in Assumption 3. 
Assumption 4. There exists a positive rj and a positive ko such that 

a log/(x, a) 2+ ” 

* 30 

exist§ and has a finite upper bound in the interval Oo — k 0 < 0 < 0 0 + ko . 

3. Some propositions. Denote yjn (0 n — 0) by z n (0) and denote the proba¬ 
bility P[z n (6) < 1 1 0] by <f \{t, 0). 

Proposition I. Within the O-interval [0 O — \ko , 0o + i&o] $n(t, 0) converges 
with n —■> co uniformly in t and 0 towards the cumulative normal distribution with 
zero mean and variance 

* ! /cr d 2 log/(z, e) 

-1/E, - — - 

Proof: In all that follows we assume that 0 takes only values in the interval 
[Oo — fa*, Oo + ko], except when the contrary is explicitly stated. Furthermore 
we introduce the variable 0i and assume that 0 1 takes only values in the interval 
[0# , 00 + ifco]. 

Because of Assumption 3 we have 


Since 

a 2 log fix, e) _ l a 2 /(x, e) _ l p/(x, a)T 
^ . dP fix, 0) 3P [fix, a)p L 60 J 

we get from Assumption 3 

F p log fix, a)T_ p a 2 iog/(x, o) 

L do J aa* '' 


(2) 
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Hence 

(3) 

Consider the Taylor expansion 


m - -E. > 0. 

dOr 


(4) ^ ^ 3 1°8 /(x° . 0Q ^ - ^°&f ( Xa » g/ ) 


dd ee ' ' w 

where S' lies in the interval [0i, 0]. Denote —p X) - ^° g -- f X - ’ — 
For 0 = d w the left hand side of (4) is equal to zero. Hence we have 

(5) yM + [Vn(k - 00] l E a - - g - ^ - ’ e - - o, 

n a otr 

or 

(6) yM) + zM l E —= o. 

Let Q»(0O be the region defined by the inequality 

(7) 


by y»(0i). 


71 a oCr 


where v denotes a positive number less than the greatest lower bound of d(0i). 
We shall prove that 


( 8 ) 


lim P[Qn(6i) | 0i] = 1 


uniformly in 0i. Let r 0 be a positive number such that 

(9) E tl Vi (x, 0, , to) - E ei d * l0g ^ 2 (x ’ dl) <~, (i = 1, 2) 


for all values of B\ . Because of Assumption 2 such a r 0 certainly exists. 
Denote by 7J n (0i) the region defined by the inequality 

(10) I d» — 01 I < To . 

On account of Assumption 1 

(11) lim P[# n (0i) | Bi] = 1 

uniformly in 6 \. Since B' lies in the interval [0i, d n ], we have 

(12) | 0' - 0i | < ro 


for all points in R n (Bi). Hence at any point in 22 n (0i) the inequality 
(13) E Vl(x« , 01 , To) < E < E *,(*« , 0! , T„) 

a—l a«»l Ov o*l 


holds. 




TESTS OF STATISTICAL HYPOTHESES 


5 


Let S n (B 1 ) be defined by the inequality 

(14) ^ £ <Pifaa , 0i, to) — E$ x <pi(x, di , to) < 1 

71 a 


and T n ($i) by the inequality 

(15) - S <pt(%a 9 0i, to) — E 9l iptix, B\ , To) < ^. 

ti 

On account of Assumption 2 we have 

(16) lim P[SM) | 0d « lim P[r.(0i) |0j = 1 

n—«o ft—oo 

uniformly in $x. 

Denote by U n (B 1 ) the common part of the regions R n (B i), S n (0i) and 7\»(0i). 
In f/n(0i) we have on account of (9), (14) and (15) 

(17) - z «<*., 01, ro) - 2?,. — l0 - g J - ^ - < - a - 1, 2). 

71 a 00 

From this we obtain (7) because of (13). That is to say, the inequality (7) is 
valid everywhere in U n (Bi ). Since 

lim P[£7 n (0i)|0i] = 1 


uniformly in 0i, our statement about Q„(0i) is proved. From (6) and (7) we 
get that everywhere in Q n (Bi) the inequalities hold: 


(18) 


< 2.(01) < Vn ^ 


d(0 ,) + v 


d(0i) 


if J/»(0i) > 0; 


(19X 


3/»(0i) 
d(0i) + v 


> Zn(0l) 


^ 2/»(0i) 

- d(fli) - r 


if J/n(0l) < 0. 


Let z*(0i) be defined as follows: z*(0 0 = z n (0i ) at any point in Q»(0i), and 
z!(0i) = y n ($i)/d(di) at any point outside Q»(0i). 

On account of (8) we obviously have 


(20) lim P[z!(0,) < 1 10j — P[z„(0i) < 1 10j = 0 

n—oo 


uniformly in t and 0i. 

From equation (1) it follows that E 9l y n (Bi) = 0. From Assumption 4 it follows 
on account of the general limit theorems 

(21) faPMW -° 

iniiformly in t and 0i. Hence 


lim P 


" y.(0i) 
_d(0 0 


< f 


*]" 4 / 


W [‘ 

2v JLoe 


- 0 


n—oo 
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uniformly in t and 0 \. Since v can be chosen arbitrarily small, we get easily 
from (18), (19), (20) and (21) 

(22) lim P < 1 1 - P[z n (0 1 ) < d|e, 0 

uniformly in t and 6i . Proposition 1 follows from (21) and (22). 

Proposition 2. Let {W n ) be a sequence of regions of siz^ a, i.e . P(W n | So) =* a, 
and let V n (z) be the region defined by the inequality 

(L — Bo) y/n < z. 

Let U n {z) be the intersection of V n (z) and W n , and denote P[U n (z) | 0o] by F n (z). 
Denote furthermore P[W n | So + m/ y/n] by G(u, n). If F n (z) converges to F(z) 
and if lim ju n = a*, then 

n—oo 

(23) lim G(nn ,n)=f dF(z) 

n—oo J— oo 

where 

i / v a 2 iog/(x, e 0 ) 
c--l/E u - — -. 

Proof: First we show 

(24) f dF(z ) = a. 

Denote P[V n (z) | 0o] by $> n (z). On account of Proposition 1 <£ n (z) converges 
uniformly to the cumulative normal distribution ^(z) with zero mean and 
variance c. It is obvious that 

(25) F n (za) - F n (zi) < $ n (z 2 ) - for Z 2 > Zi . 

Hence 

(26) F(z 2 ) - F{z x ) < }f/(z 2 ) - Mzi) for Z 2 > z x . 

From (25) we get 

(27) [lim F„(z)] - F.(*) = a - F„(z) < 1 - *„(*). 

Hence 

(28) a - F(z) < 1 - *(z). 

Since F»(z) < a and therefore also F(z) < a, we get from (28) 

0 < a - F(z) < 1 - *(z). 


== a. 


Hence 

(29) 


lim F(z) 


* 
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Since F n {z) < $»(*0> we have F{z) < ^( 2 ), and therefore 
(30) lim F(z) = 0. 


The equation (24) follows from (29) and (30). 

It follows easily from (26) that the integral on the right hand side of the equa¬ 
tion (23) exists and is finite. 

Let us denote 0o + Mn/ y/n by 0 n . Consider the Taylor expansions 


(31) 

and 


£ lOg/0r« , do) = £ 10g/(x a , 6n) + (00 ~ 0n) 2 4 l0g/(a:«, dn) 

a a a w 

+ i(0o — d„) 2 22 vL log/(x a , 0«) 

a ertr 


£ k>g/(x a , 0„) = 22 10g/(x a , d„) + (0„ - d») 23 4 10g/(x«, 6n) 

a a a Cftf 

(32) 

+ i(0n ~ dn) 2 ]£ ^ l0g/(x a , 0n) 

where d' n lies in the interval [0 O , K] and 0n lies in the interval [0 n , d n ]. Since 
0 n is the maximum likelihood estimate, we get from (31) and (32) 

(33) 22 log/(x„, 0 O ) = 22 log/(x 0 , $») + i(0 o - $n) 2 22 ^ log/(x„, O, 

(34) 22 log/(x<,, 0j = 22 log /(x„, d n ) + l(6n - $«)* 22 log/(x„, O. 

a a a OU 


Denote by P a real variable which can take any value between — 2/x and +2*i. 
Denote by R n the region defined by the inequality 

(35) | &n - 001 < n - *. 

From Proposition 1 it follows easily that 

(36) lim P(R„ 10 O + /3 /y/n) = 1 

n—«o 

uniformly in p. Denote 2n“* by r„ . Then for almost all n the following 
inequalities hold at any point in R n : 

^2 

(37) 2^ ^l(x« j 00, r n ) ^ 2^ ^/(*^«) 0n) ^ 2J 00 > “Tn), 

a a W a 

d 2 „ 

(38) 2Z ^l(x«, 00, T n ) ^ /O^a, 0n) ^ ^(x a , 00, T n ). 

a a Otr a 

Denote by S n the region in which (36), (37) and (38) simultaneously hold. It 
is obvious that 


lim P(S n 10o + P/Vn) = 1 
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uniformly in fi. Denote 0o + 0/\/n by d n (P). From Assumption 2 it follows 
easily that 

(2J j 00 j r n ) 1 i 

(39) lim -i-= B u ~ log f(x, 0,) - — (i = 1,2) 

n-«o ( n ) off* c 

uniformly in /9. Furthermore the variance of Z <9, ^ a:< * ’ l Tn ^ if 0„(/9) is the 

at W 

true value of the parameter 0, converges to zero with n —► * uniformly in (3. 
Hence a sequence {X„j, (n = 1, 2, • • • , ad inf.), of positive numbers can be 
given such that 

(40) lim X n = 0 

f»—00 

and 

(41) lim P[T n | 0 n m « 1 
uniformly in 0, where the region T n is defined by the inequality 

(42) I Z ^ g, - 0, r - n - ) + ~ < X.»“* it = 1, 2). 

la n c 

From (37) and (38) it follows that in the intersection T' n of T n and S n 

(43) | \ Z L log/(x„, 0») + 1 < X B rT* 

n a ofr c 

and 

(44) rZi log f(x a , 0n) + 1 < X„ n -1 . 

! n a otr c 

We get from (33), (34), (35), (43) and (44) that at any point in T„ 

(45) Z l0g/(x«, 0n) — Z log/(x«,0 O ) = [(00 — — (0n — bnf] + \'n , 

a a 

where | Xl | < pX n , and p denotes a constant not depending on n. 

On account of (36) and (41) we have 

(46) lim P[T' n \0 n m = 1 

n— oo 

uniformly in /3. 

Denote by !T B (z) the intersection of U n (z) (defined in Proposition 2) and T' n . 
Denote furthermore P[T"(z) | 0 O ] by F*(z). 

Since 

»[(0O - $„)* - (0n - $„)*] = n[(0o - dn) 2 - (00 - + Mn/V^) 2 ] 

= —Mn + 2 \ / nn tt (&n ~ 0o), 
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we get from (45) and (46) 

(47) lim W(z) 10„] - [' e^-^dF^t)} 

»-«o L •*-« J 


uniformly in z. It is obvious that 

(48) lim { P[T"{z) 1 0 n ]-P[UM |0j} - 0 

n—oo 


0 


uniformly in z. Hence we get from (47) 

(49) lim jp[t/ n (z) | d n ) - [‘ e -*<4-^o/, dF * (0 \ = 0 

A"00 JL-qo J 

uniformly in z. It follows from (49) that for any positive L 

(50) lim jp[l/„(L) 1 6 n ] - P[U n (—L) I e n ] - £ = 0> 

Since lim n n = lim [F*(t) — F n (t)] = 0 uniformly in t } and since lim F n (t) = 

n— oo n—oo 

F(0 uniformly in t , we get from (50) 

(51) lim {P[t/„(L) 10„] - P[r/„(-L) |e„]| = C 

n—oo J— L 

Now let us calculate the limit of P[V n (z) | 0»] if n —► oo. The region V n (z) is 
defined by the inequality 

(52) (d n - 6 0 ) Vn < z. 

This inequality can be written as follows: 

(53) (dn — On) \/n < z — Hn . 

Since lim n„ = n, we get on account of Proposition 1 


(54) 


lim P[(d„ — 6 n )Vn < z - 10»] = f e 

n—oo y/ 2 tTC 




dt 


_L. f 

\/ 2irc *-» 


-$(<—*i) a /e 


dt 


Hence 

(55) 


lim P[F,(z) 10„] = —4= f e 

n—oo \/ 27 TC **-°° 




dt 


uniformly in z. 

For any positive e let L, denote the positive number satisfying the condition: 


V2 


jL, T [ L ‘ dt + f° e ~ iu ~ l ‘ )llc dtl = J. 

2 irc •'J'. J « 


(56) 
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From (56) we easily get on account of (26) 

(57) 0 < [ dF(t) - ['' e^~ w)h dF{t) < * . 

Since the region U„(z 2 ) — U n (zi) is a subset of Vn{zf) — V n (zi ) for & > t \, 
we have on account of (55) and (56) 

(58) lim sup | {P[t7,(»)! 6 n ] - P[U n (L t ) 1ftj + P[U n (-L.) \ 6 n ]} \ < J. 

!»-♦<» « 

Since 

P[U*(*>) | e n ] = (?( M n, n), 

we have 

(59) lim sup | G’Gun, ») - {P [t/„(L.) 1 $ n ] - P [ U n ( - L.) | 0 B ]} | < * . 

n-*oo Z 

From (51), (57) and (59) we get 

(60) lim sup G(jx n , n) - f 2l “ )lc dF(t) | < e. 

Since e can be chosen arbitrarily small, Proposition 2 is proved. 

4. Theorems on asymptotically most powerful tests. 

Theorem 1: Let M n be the region defined by the inequality y /n (0 n — 0 o ) > A n , 
where A n is chosen such that P(M n | 0 o ) = a. Then { M n ( is an asymptotically 
most powerful test of the hypothesis 6 — 0 {) , provided the parameter 6 is restricted 
to values > 0 o . 

Proof: Assume that there exists a test \W n \ of size a such that 

(61) lim sup L(W n , M n ) = 5 > 0. 

n—*oo 

Then there exists a subsequence \n'\ of the sequence [n\ and a sequence {0 n '} 
of parameter values > 0 0 such that 

(62) lim {. P(W n > | e r ,) - P(M n ’ | M} = $ 

The expression 

(63) ( 0 n ' — Oq) y/n = nn' > 0 

must be bounded. This can be proved as follows: Since under the assumption 
0 = Oo the distribution of y/n (d n — Of) converges to a normal distribution with 
zero mean and finite variance, the sequence {A n } must be bounded. Hence M n 
is defined by the inequality 


(64) 


Oo ^ AJyfn — €, 
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where 

(65) 


lim c n = 0. 


From Assumption 1, (64) and (65) it follows easily that if 

lim 8 n ' = 6i > do, lim P(M„> | 6„>) = 1. 


Hence on account of (62) we must have 

(66) lim 6n' = 0o. 

If there would exist a subsequence {n*} of {n'\ such that lim n„> = », then 

n»«qo 

on account of (66) and Proposition 1 we would have lim P(M n * | 0»*) = 1, 

«■« 

which is in contradiction to (62). Hence the expression (63) must be bounded. 
Let (n"( be a subsequence of {n' ) such that 

(67) lim w = ijl > 0. 

n—co 

Denote by F n (z ) the probability of the intersection of W n and the region 
(0n — 0o)\/n < z under the hypothesis that 6 = 0 O . Consider the subse¬ 
quence {n"'} of the sequence \n") such that F n ^>{z) converges with n —► oo 
towards a function F(z). The existence of such a subsequence { n f "} can be 
proved as follows: Denote the probability P[(0 n — O 0 )y/n < z \ 6 0 ] by $»(z). 
On account of Proposition 1, 4> n (z) converges with n —> oo uniformly in z towards 


( 68 ) 


m - -jL f 

V27rc 




dt 


where c has the same value in (23). 

We obviously have 

(69) F n (z 2 ) - F n (zi) < $ n (z 2 ) - $n(Zl) 
for any pair of values Zi, z 2 for which z 2 > Z \. Hence 

(70) lim sup [F n (z 2 ) - F n (z x )] < ^(z 2 ) - ^(*i). 

n —*oo 

Since F n (z ) is a monotonic function of z ) our statement follows easily from (70) 
and the fact that ^(z) is uniformly continuous. Hence on account of Proposi¬ 
tion 2 we have 


(71) 
and 

(72) 


lim | 0n-) = [ 

oo * 

lim P(M n f / t | S n t / #) S5B f 

n-MO J—oo 


M t)/c 




dF{z) 


d$(z) 
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where 

(73) $(«) = 0 for z < Zo , 

(74) $( 2 ) * \f>(z) — ^( 20 ) for 2 > 20 , 
and 20 is given by 

(75) 1 — ^( 20 ) = a. 

From (62), (71) and (72) we get 

(76) ^ d[F(z) - 4 .( 2 )] - 8 > 0. 

Consider a normally distributed variate y with mean v and variance c. Let B 
be a critical region of size a for testing the hypothesis v = 0 by a single observation 
on y , i.e. B is a subset of the real axis [— «, + oo]. Denote by D(v ) the inter¬ 
section of B and the region C(v) defined by the inequality y < v. Denote by 
H(v) the probability of D(y) under the hypothesis v = 0. Then the power of 
the test B with respect to the alternative v = n is given by the following ex¬ 
pression 

(77) dH(v). 

J— oo 

If the region B is given by the inequality y > v 0 where v 0 is chosen such that the 
size of B is equal to a, then H(v) = $(v) where the function <i> is defined by the 
equations (73), (74) and (75). Since the latter test is uniformly most powerful 4 
with respect to all alternatives v > 0, for any positive m the inequality 

(78) f d[H(v) - $(»)] ^ 0 

J— oo 

holds. Let 


Hv) - 


-/= r dt. 

V 2 ire 


It is obvious that 

(79) H(v 2 ) — H(v i) < \l/(v 2 ) — for v 2 > Vi 


and 

(80) [ dH(v) = a. 

J— oo 


4 See for instance J. Neyman and E. S. Pearson, “Contributions to the theory of testing 
statistical hypotheses,” Stat. Res. Memoirs, Vol. 1 (1936). 
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On the other hand, if K(v) is a monotonically non-decreasing non-negative func¬ 
tion of v such that 

(79') K(v t ) - K(v 0 < +(v t ) - for t* > v t 

and 

(800 j[" dK(v) 

hold, then there exists a sequence (t = 1, 2, • • • , ad inf.), of regions of 

size a such that 

lim H w (v) = K(v) 

*—oo 

uniformly in v. Since (78) holds for H(v) = and since 

H“\v 2 ) - H {i \vi) < *{v 2 ) - for v 2 > Vi f 

it is easy to see that (78) will hold also for H(v ) = K(v). Hence for any mono¬ 
tonically non-decreasing non-negative function K(v) for which (79') and (80') 
are fulfilled, also (78) must hold. Since F(v) is a distribution function which 
satisfies (79') and (80'), we have a contradiction to (76). This proves Theorem 1. 

Theorem 2: Let M n be the region defined by the inequality y/n ( 6 n — 6 0 ) < A n y 
where A n is chosen such that P(M n | 0 O ) = ot. Then \M n ] is an asymptotically 
most powerful test of the hypothesis 6 = 0 O , provided that the parameter 0 is restricted 
to values < 0 O . 

We omit the proof since it is entirely analogous to that of Theorem 1. 
Theorem 3: Let M n be the region consisting of all points which satisfy at least 
one of the inequalities 

y/n — 0o) < —A n , y/n ( 6 n — 0 O ) > A n . 

The constant A n > 0 is chosen such that P(M n | 0 O ) = a. Then {M n } is an 
asymptotically most powerful unbiased test of the hypothesis 0 = 0 O . 

Proof: Assume that there exists a sequence { W n ) (n = 1 , 2, • • • , ad inf.) 
of regions such that 

(81) P(W n | 0o) = a 

(82) lim g{Wr) = a 

n—co 

and 

(83) lim sup L(W n , M n ) = 5 > 0. 

n-*oo 

We shall deduce a contradiction from this assumption. On account of (83) 
there exists a subsequence \n r ) of [n\ such that 

(84) lim \P{W«' | 0nO - P(Mn' | 0nO} - 
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The expression 

(85) (On’ - OoWn' = Mi*' 

must be bounded. The proof of this statement is omitted, since it is analogous 
to the proof of the similar statement about (63). Hence there exists a subse¬ 
quence {n"} of {n f } such that 

(86) limMn" = M- 

n—oo 

Denote by F n (z) the probability of the intersection of W n with the region 
(d„ ~ 6o)\/n < z under the hypothesis 0 = 0o. Consider a subsequence \n'") 
of \n"} such that F n ">(z) converges with n —► « towards a function F{z ). 
The existence of such a sequence { n'"\ can be proved in the same way as the 
similar statement in the proof of Theorem 1. Hence on account of Proposition 2 
and (86) we have 


(87) 

lim P(Wn’” | 

\0n‘") = f dF(z) 

and 

n—oo 

J— oo 

(88) 

lim P(M n | 

0 n ,„) = r d<t>(z) 

where 

n—oo 

J— oo 

(89) 


f dt for z < — Zo, 

— 00 

(90) 

4>(z) = <t»(-Zo) 

for — Zo < z < z 0 

(91) 

f>(z) = $(— Zo) H- 7 = [ e it,lc cU for z> 

V 2wc j *q 


and 


(92) *(-*>) = Ja. 

From (84), (87) and (88) it follows that 

(93) f m e ~ il ' l ‘ , ~ 2l “ >lc d[F(z) - *(*)] 5. 

Consider a normally distributed variate y with means v and variance c. Let B 
an unbiased critical region of size a for testing the hypothesis v = 0 by a single 
observation on y , i.e. B is a subset of the real axis [— «?, +oo]. Denote by 
D(v) the intersection of B with the region C(v) defined by the inequality y < v. 
Denote by H(v) the probability of D(v ) under the hypothesis v = 0. Then 
the power of the test B with respect to the alternative v = ju is given by 
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If the region B consists of all points which satisfy at least one of the inequalities 
y < —Vo,y > v B , and if Vo > 0 is chosen such that the size of B is equal to a, 
then H(v) = where $(«) is defined by the equations (89)-(92). Since the 
latter test is a uniformly most powerful unbiased test,® for any n the inequality 


(95) 





d[H(v) - *(»)] < 0 


holds. Let 


It is obvious that 


*(») = —y= f e~ i,t,c dt. 
V 2 ire J -“ 


(96) HM — H(v i) < \pM — 'PM for v 2 > Vi, 

(97) [ dH(v) = a 

J— 00 


and 

(98) 



c i(M * iliv)lc dll(v) has a minimum for p 


= 0 , 


On the other hand, if K(v) is a monotonically non-decreasing non-negative func¬ 
tion of v such that 


(96') K(v 2 ) — KM < \pM ~ 'PM for v t > v t , 

(970 [ dK(v) = a , 

J_0O 

(98') f dK(v) has a minimum for p = 0, 

J— oo 

then there exists a sequence \B {t) \ (i = 1, 2, • • • , ad inf.) of unbiased regions 
of size a such that 

lim H (i) (v) = K(v) 

*“00 

uniformly in v. Since (95) holds for H(v) ~ (i = 1, 2, • • • , ad inf.), 

and since 

H U) M - H u \v i) < 'PM ~ 'PM for #»>»!, 


it is easy to see that (95) holds also for H(v) = K(v). Hence for any mono¬ 
tonically non-decreasing non-negative function K(v) for which (96'), (97'), and 
(98') are fulfilled, also (95) must be fulfilled if we substitute K(v) for H(v). 


5 J. Neyman and E. S. Pearson, 1. c., p. 29. 
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Since F(v) is a distribution function which satisfies (96'), (97') and (98'), we 
have a contradiction to (93). This proves Theorem 3. 


6. Appendix. Proof of the uniform consistency of 6 n . It will be shown here 
that under certain conditions on the density function f(x, 0), Assumption 1, 
i.e. uniform consistency of i) n , can be proved. 

For any open subset co of the 0-axis we denote by <p(x, o>) the least upper 

bound, and by ^(x, w) the greatest lower bound of -- — with respect 

otr 


to 0 in the set o>. For any function X(x) we denote by 2?*X(x) the expected value 
of X(x) under the assumption that 0 is the true value of the parameter, i.e. 



x(x)/(x, e) 


dx. 


Denote furthermore by P(0 n co> | 0) the probability that b n will fall in o> under 
the assumption that 0 is the true value of the parameter. Finally denote by 12 
the parameter space and assume that 12 is either the whole real axis or a sub¬ 
set of it. 

Proposition 3. b n is a uniformly consistent estimate of 0, i.e. for any positive k 
lim P( — k <d n — 0<fc(0) = 1 

n—oo 

uniformly for all 0 in 12, if the following two conditions are fulfilled: 

Condition 7. For all values 0 in 12 




Condition II. For any value 0 in 12 there exists an open interval co(0) containing ® 
and having the following three properties: 


Ha. 


lim P(0 n € w(0) I 0] = 1 


uniformly for all 0 in 12. 

lib. E$<p 2 [x, o>(0)] is a bounded function of 0 in 12, and the least upper bound A of 
Ee<p[x , «(0)] with respect U> 0 m 12 is negative . 

II C . E$\l/[x, 0 ,( 0 )] is a bounded function of 0 in the set 12. 

Condition I means simply that we may differentiate under the integral sign. 
In fact 

f f(x, 6) = 1 

J— oo 

identically in 0. Hence 


Differentiating under the integral sign, we obtain Condition I. 
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In case that w(0) is the whole axis Condition II» reduces to the condition 
that & n exists. 

In order to prove Proposition 3, we show first that for any positive i) 

m <’)l*] - 1 

uniformly for all 0 in 12. We have on account of Condition I 
(100) E, a l °g/ (M) = E, / f(x, 8) - £ dx = 0. 


d S log f(,X 
00 s 


•* -•» [*s-v* *»] - /*■ «- m*** 


we have on account of Condition I 


f d log f(x 




d 2 log f(x, 6) 


According to Condition II E>f/[x, w(0)] < 0 and is a bounded function of 0* 

SillCC E$ ^ lU ZK X > ^ n nn/1 ^ dM +l\/i Irtf^ knn^ /if flfll ^ 5 A 


< 0 and > Ei\l/[x, «(0)], the left hand side of (101), i.e. 


the variance of , is a bounded function of 0. From this and the 

00 

equation (100) we obtain easily (99). Consider the Taylor expansion 

(102) 1 £_(#-».) 1 r, 

71 or O0 71 a w 

where o' n lies in the interval [0, 0„]. Let e be an arbitrary positive number and 
denote by Q„(0) the region defined by the inequality 

(103) l£ito g/fa.,»> <,. 


On account of (99) we have 


lira P[QM | 0] = 1 


uniformly for all 0 in S2. 

Denote by 12»(0) the region defined by the inequality 


- Z <p[x a , w(0)] < \A < 0. 


On account of Condition lib 


lim P[R n (6) | 0] = 1 
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uniformly for all 6 in C. Denote by B n (d) the region in which ew(8). 
in B n (6) 




n 


Since 


we have in the intersection R' n {8) of R n (0) and B„(0) 


007) 



a 2 iog/(z„ 

ae 2 


«:> 


> 


Denote by U n (0) the intersection of Q„(d) and R' n (6). It is obvious that 


(108) 


lira P|C„(0) | 0] = 1 


uniformly for all d in fi. From (102), (103) and (107) we get that in U„(d) 

aw i»- s -'S|uru r 

Hence on account of (108) 

limp(!» — *.l< 

uniformly for all 0 in 12. Since e can be chosen arbitrarily, Proposition 3 is 
proved. 

Conditions I and 11 are sufficient but not necessary for the uniform con¬ 
sistency of b n . For sufficiently small 03 (d) the conditions Mb and IF are rather 
weak. In fact, on account of (101) we have 

F a 2 log f(x, e) 

Ee — ao 2 <a 


Hence for sufficiently small intervals 03(d) , under certain continuity conditions, 
also Ee(p[x, w(0)] will be negative. However, in some cases may be difficult to 
verify II a for small 03(d). On the other hand, for sufficiently large 03(d) (cer¬ 
tainly for 03(d) = [—oo, + 00 ]) II H can easily be verified, but the conditions II b 
and IF might be unnecessarily strong. In cases where II b or II C does not hold 
for 03(d) = [— 00 , + 00 ] and the validity of II is not apparent, the following 
Lemma may be useful: 

Lemma: Proposition 3 remains valid if we substitute for Condition II the con¬ 
ditions 

IV. Denote by T n the set of all points at which b n exists and 
(110) £ log f(x a , 6*) = 0 

has at most one solution in 0*. Then lim P[T n | 0] = 1 uniformly for all 0 in 12, and 

n —>oo 

II". There exists a positive k such that for 03(6) = 7(0) = (0 — k, 0 + k) the 
following two conditions hold : 
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lib • E t <p[x, 7(0)] is a bounded function of 0 in Q and the least upper bound A 
of Ew[x, 7(0)] with respect to 0 in Q is negative. 

II" E$4>[x, 7(0)] is a bounded function of 0 in the set fi. In cases where lib 
or II 0 is not fulfilled for «(0) = [— », + »] the verification of II' and II" may 
be easier than that of II. 

Our Lemma can be proved as follows: Consider the Taylor expansion 

(HI) \ 2 log A*.» fl *) = \ 2 L log fix*, 0) + (0* - 0) \ 2 ^ log fix*, 0') 

n oO n do n o0 2 

where B f lies in [6, 6*]. Denote by V n (9) the region defined by 

(112) - 2<p[x a , 7(0)] < JA < 0. 

n 

On account of lib we have 

(113) \imP[V n (0)\6) = 1 

n—oo 

uniformly for all 6 in 0. Let W n (9) be the region defined by 

(114) ,121-108/(^,0) <«• 

From Condition I and Condition II" it follows easily that 

(115) lim P[WM 10] = 1 

n *»oo 

uniformly for all 0 in U. For all values 9* in the interval 1(0) we have 

(116) - 2<p[x„, 7(0)] > \ 2 ^ log fix*, 0'). 

n n o9 c 

Because of (112) and (116) we have in F«(0) 

(117) - 2 £ log fix* , 0') < < 0 

n off 2 

for all values 6 * in the interval 1(6). Let « be less than | \kA |. Then in the 
intersection W' n (9) of the regions V n (6) and W n (9) we obviously have on account 
of (114) that the values of the left hand side of (111) for 6* = 9 + k and 9* = 
9 — k will be of opposite sign. Hence at any point of W' n (0) the equation (110) 
has at least one root which lies in the interval 1(6). Since (110) has at most 
one root in T n and since 9 n is a root of (110), we get that at any point of the 
intersection Wn(9) of W' n (9) and T n , lies in 1(6). Since 

(118) lim P[W"(6) 1 9] = 1 uniformly for all 9 in fi, 

n-» oo 

also 

(119) lim P[d B «7(0) 10] = 1 

n—oo 

uniformly for all 0 in fi. The relation (119) combined with the conditions lib 
and II" is equivalent to Condition II. Hence our Lemma is proved. 



EXPERIMENTAL DETERMINATION OF THE MAXIMUM OF A 

FUNCTION 1 

By Harold Hotelling 
Columbia University , New York City 

1. The necessary background for efficient experimental determinations. We 

shall deal with the problem of arranging an experiment for determining the 
value of x for which an unknown function f(x) is a maximum or minimum. 
This problem is to be distinguished from those of estimating the maximum or 
minimum itself, and of studying the distributions of such estimates, problems 
to which Bernstein [1] and Rice [2] have contributed. 

The range of applications in which determinations of maximizing and mini¬ 
mizing values are important is extremely wide. Among these are the deter¬ 
mination of the time of year at which the number of algae or bacilli in a lake 
is a maximum, and the amount of fertilizers and of irrigation water making the 
yield of a crop a maximum. The magnetic permeabilities of permalloys, per- 
minvars and permendurs as functions of the induction, and the hardness of a 
copper-iron alloy as a function of the time of aging at 500°C., possess smooth 
maxima having interest in telephony, [3], [4], The effective range of a gun is a 
function of the speed of burning of the powder, a variable which can be con¬ 
trolled. Almost every entrepreneur has a fervent desire to know the selling 
prices that will yield a maximum profit, and a few have undertaken controlled 
experiments with a view to finding out. There are also numerous practical 
problems of minimizing costs; for example, the cost of operating a ship as a 
function of its speed possesses a minimum. We shall confine our attention 
chiefly to the experimental determination of maxima, since such problems seem 
to occur naturally with greater frequency in applications; there is no loss of 
generality in this, since /(x) has a maximum where —/(x) has a minimum. 

We shall assume that, for each value of x in the set we shall select, one or 
more observations will be made on y = /(x), and that these observations are 
afflicted with errors which are independently distributed about zero with a 
common variance <r 2 . From this it follows that if f(x) is a linear function of 
known functions of x, with unknown coefficients fo , /Si, • • • , P P (for example 
a polynomial in x), the most efficient method of fitting is the method of least 
squares, which yields unbiased estimates , • • • , b p of fo , • • • , P P having the 
least possible variances; this is true whether or not the errors are normally 
distributed. If the fourth moment of the errors is finite, and if the number N 

1 Presented at the joint meeting of the Institute of Mathematical Statistics and the 
American Mathematical Society at Hanover, September 10, 1940. 
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of observations is large, the estimated coefficients will be distributed in an 
approximately normal manner; and so also will any function of them that is 
regular in a fixed neighborhood of its "population value." By the "population 
value” of a function 0(6o , ••• ,b p ) we mean , • • • , 0 P ). In particular, if 

f(x) = 00 + fax + fax* • • • f}fX P 

has a maximum for x = £ of the simplest type, such that/'(£) = 0 and/"(£) < 0, 
so that £ is a simple root of the equation 

fit) — fa + 20 j £ + • • • + pPpUr" 1 ~ 0 , 

and if Xo is an estimate of £ found from the polynomial fitted by the method of 
least squares, so that 

fa -f- 2biXc + ••• + pbpx o 1 “ 0, 

this last equation defines x 0 as a function of &i , • • • , b p . The function is, to 
be sure, multiple-valued when p > 2; but for sufficiently large values of N the 
probability will become arbitrarily great that the roots obtained from a random 
experiment will each differ by an arbitrarily small quantity from one of the roots 
of f(x) — 0. Then provided we have a sufficient preliminary approximate knowl¬ 
edge of £, we may choose the root nearest £; and the probability distribution 
of this root, which in nearly all experiments will be a single-valued function 

4>(bi , • • • , b p ), 

will approach normality of form, with standard error of order N~ U1 , about a 
mean differing from 

£ = 0(fti, • • • , 0j>) 

at most by terms of order N~ l , which are thus negligible in comparison with the 
standard error. The situation will be effectively the same if, without knowing £ 
in advance even approximately, we choose the root xo giving the greatest value 
f(x o), provided /(£) is greater than any other value of f(x). 

From these considerations it appears advisable, whenever the unknown func¬ 
tion is capable of being represented adequately by a polynomial of degree p 
considerably less than the number N of observations, to fit a polynomial of 
degree p by least squares, and from it to determine the maximizing value by 
differentiation. In practice, however, there are obstacles to carrying out such 
a procedure with confidence. The form of the function is usually not known; 
it is far from clear what value should be given p even if the function is to be 
regarded as a polynomial; the use of a polynomial which does not give a suffi¬ 
ciently good fit, with observations taken at a considerable distance from the 
maximizing value, perhaps separated from it by other maxima and-minima, 
appears to be a highly dubious proceeding; and if p is taken large, the labor of 
calculation becomes excessive. For all these reasons it is desirable to assign 
the values of x which are to be the basis of the experimental work close enough to 
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the maximizing value £ so that a polynomial of very low degree will fit ade¬ 
quately ini the neighborhood. 

We shall restrict ourselves to functions having continuous derivatives of all 
relevant orders 2 in a neighborhood of £. Such a function can in a sufficiently 
small neighborhood be approximated by a polynomial of the second degree. 
The necessity of using a polynomial of higher degree can therefore be avoided, 
when a fairly good knowledge of the function is already in hand , and when the 
number N of observations that can be made is large enough, by choosing all 
the values of £ in a sufficiently small neighborhood of £. We shall suppose that 
this is done; that is, a regression equation 

Y = bo + b\X “4” &2# 2 

is fitted by least squares to a large number of observations after choosing the 
values of x quite close to the true maximizing value £; and the estimate x 0 of £ 
is a solution of dY/dx = bi + 2b*x = 0, so that 



We shall examine the errors in x 0 arising both from the inadequacy that may 
exist in the quadratic approximation and from the random errors of observation, 
and shall consider what distribution of x may most appropriately be chosen to 
reduce the errors of both kinds, and to place them in a suitable balance with 
each other. 

It will be observed that a fairly definite preliminary knowledge of the function 
under investigation is required for such a program. Any criterion for the selec¬ 
tion of values of x for experimentation must involve not only the value of £ 
but also the values of the first few derivatives in a neighborhood of £, or some 
similar information. The requirement of preliminary information is essential 
for the efficient design of experiments in general. For instance the efficiency 
of an agricultural field experiment depends on the correctness of the appraisal, 
before the experiment is laid down, of the general nature of the fertility gradients 
likely to exist in the field and of the variances due to error and main effects 
which will be revealed more accurately by the experiment itself. If the pre- 

* Other cases may well arise in practice and deserve separate consideration in connection 
with the particular investigations in which they arise. For example various physical 
properties of alloys, regarded as functions of the proportion of a particular constituent, 
have maxima, but may have discontinuous derivatives because of the phenomena of crys¬ 
tallization and solution of one metal in another. The assumptions appropriate to an in¬ 
vestigation, parallel to that of the present paper, of the proper organization of experiments 
for finding such metallurgical maxima must be drawn from metallurgy. The case of con¬ 
tinuous derivatives is however of widespread importance. If no regularity assumption is 
made about the function, one set of N values of x is as good as another, and no set is likely 
to tell us very much about the function if it is one of the violently irregular ones utilized 
in the theory of functions to emphasize the necessity of studying that subject. 
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liminary information is incorrect, a properly arranged self-contained experiment 
will nevertheless give results which are valid , in the sense that the significance 
probabilities calculated from them by accurate methods are correct, but will be 
inefficient , in the sense that another experiment of the same cost, based on better 
preliminary information, would be more likely to detect real effects through the 
smallness of such a calculated probability. The efficient conduct of experi¬ 
mentation thus proceeds in stages of ascending magnitude. A large-scale in¬ 
vestigation should be preceded by a smaller one designed primarily to obtain 
information for use in designing the large one. The small preliminary investiga¬ 
tion may well in turn be preceded by a still smaller pre-preliminary investigation, 
and so on, 8 like an army marching after an advance guard, which follows a more 
advanced smaller detachment, which follows a still smaller and still more ad¬ 
vanced unit, which follows a “point.” At the very beginning of the process of 
chain experimentation will stand work based on little or no clear information 
of the kind required for efficient design. This first phase will be speculative 
and exploratory in character. Neither its cost nor its accuracy can well be 
estimated in advance. It is a favorite, but not exclusive, preoccupation of men 
of genius. Many of its results turn out to be worthless. But it is an essential 
preliminary to well-organized research directed to definite aims defined qualita¬ 
tively in advance. 

After the first speculative and unsystematic phase in the knowledge of a 
subject is past, but before the careful, economical organization of an accurate 
investigation, an intermediate type of exploration is needed to supply estimates 
of the parameters required for the design of the full-scale investigation. In the 
present case such a systematic though small-scale experiment might perhaps 
consist in dividing a range within which the desired maximizing value £ is known 
to lie into equal parts, making at least two observations at each of the ends of 
these intervals, and fitting a polynomial of at least the fifth degree by least 
squares. This will make possible estimates of the parameters <r, ft , & , • • • , ft 
(and hence of £) required for using the efficient designs which we shall obtain. 
At least six different values of x arc required for fitting the polynomial of the 
fifth degree. The fitting process is facilitated by taking them in arithmetic 
progression and using orthogonal polynomials. 


3 A remarkable example of such a series of investigations is the chain of sample censuses 
of area of jute in Bengal carried out for the Indian Central Jute Committee under the 
direction of Prof. P. C. Mahalanobis annually beginning in 1937. Each year’s work is 
designed primarily to obtain information for planning the next year’s, and a sequence of 
four or five such investigations, each considerably larger than the preceding, is planned 
to lead up to an eventual annual sampling of the whole immense jute area in the province. 
A partial account of this is given in [5], a fuller one in confidential but printed reports of 
the Indian Central Jute Committee, Calcutta. 

Certain multiple-sample schemes in manufacturing inspection also provide good 
examples of chain experiments, [0], 
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2. Sampling errors and bias in the quadratic approximation. Let us measure 
all values of x from the value £ under investigation which makes /(x) a maximum. 
Then £ = 0, and in the expansion 

(1) /(x) = /3 0 + fax + fax 2 + fax 2 + • • • 

we shall have fa = 0 and fa < 0; we shall assume that fa < 0. An observation 
y a corresponding to a chosen value x* will have, by assumption, an error A« of 
zero expectation and variance cr 2 , such that 

(2) ya = f{Xa) + A* . 

A quadratic estimate 

(3) Y - b 0 + fax + fax 2 

of f(x) is obtained by means of normal equations which may be written 

dofa + (hfa + dtfa = Sy 

(4) aifa + 02 &i + d$fa = Sxy 

difa + dzfa + difa = Sx 2 y } 

where S stands for summation over all the observations, so that, for example, 
Sy = Xy a = yi + y% + • • • + y* , and where 

(5) d k = Sx k . 

In particular, do = N. A determinate solution is possible only if there are at 
least three distinct values of x; we shall always suppose therefore that this is 
the case. This is equivalent to assuming that the determinant a of the coeffi¬ 
cients in (4) is not zero. A greater number of observations y is necessary to 
obtain an estimate of the variance a 2 , and furthermore wc shall suppose this 
number large in our approximations, but since repeated observations may be 
made for each value of x, it is not essential that there be more than three values 
of x in the distribution to be selected. 

If we put 

(6) Sbk = bk — fa , 7k == Sx k A, 

for k = 0, 1, 2, substitute (1) in (2) and the result in (4), and utilize (5) and 

(6) , we obtain 

dffifa + didfa *|" a^Sfa = yo -f- difa + difii -(-••• 

( 7 ) didfa + 02861 + 03862 = 71 + d$i + 06& + • • • 

O2860 “I” Os86i + O4862 72 4 ” difa 4 - di04 + ••• 

From these equations it follows that the errors 86* are homogeneous linear func¬ 
tions of the right-hand members and will therefore be small if the quantities on 
the right are small. Of these quantities, the 7 *'s will be stochastically of the 
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order N llt for large samples with any fixed set of values of x. When the equa¬ 
tions are solved, their coefficients will be of the order of N* 1 , so that the product 
is of order N~ m , and becomes negligible if N is large enough. The coefficients 
a* of ft,ft, • • • can be kept small if the values of x are chosen to lie within a sufficiently 
restricted range. Of course the coefficients a* in the left members of (7) will 
also be small in this case, but not small enough to offset fully the smallness of 
those on the right. To see this, we observe that if all the values of * be multi¬ 
plied by any quantity g, a* is multiplied by g k , while 




Oo 

0i 

0*1 

(8) 

a =* 

0i 

a» 

0*1 



Os 

«3 

a* 1 


is multiplied by g 1 . The cofactors of the last column are proportional respec¬ 
tively to g*, g* and g 2 . Hence, in the expression for 56*, the coefficient of 0* is 
of order g, that of 0 4 is of order g 2 , and so on, the coefficients of the 0’s of higher 
orders vanishing more and more rapidly with g as we go on in the sequence. 
The like is true of 56i and 56 0 , which vanish even more rapidly with g. Thus 
we may, by restricting sufficiently the range of x on the basis of the assumed 
preliminary knowledge of the function, and taking a sufficiently large sample 
of observations, bring it about that the probability will be arbitrarily close to 
unity that the 56*’s are less than any assigned limits. 

Let us, in particular, restrict the range sufficiently and take a large enough 
sample to make it reasonable to regard 56* as negligible in comparison with 0* . 
The error in the estimate 


(9) 


Xq = 


by_ 

26* 


of the maximizing value £ will, since we are taking £ = 0, be x 0 itself, and may 
be written 


Sx 0 = — 


5&i 

2(0* + 56*) 



where the terms other than 1 in the last parentheses are negligible. The problem 
of minimizing the error 5x 0 is then virtually equivalent to minimizing the error 
56i. In section 5 it will be shown that it is not until we reach terms of the 
order of g b that the errors 56* need be taken into account. We shall first discuss 
the errors in xo of lower orders in g, and thus confine the discussion to 56i. For 
the present we shall take as the quantity to be made as small as possible the 
expectation of the square of this last error, E(8bi) 2 . This is not the same as the 
variance of 6*, since E&b x is not in general zero. We have, in fact, by trans¬ 
posing a familiar formula for the variance, 


(10) 


E(8h) 2 - (E56i) 2 + <4,, 
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thus dividing our minimand into two parts, due respectively to the bias arising 
from the neglect of terms of third and higher orders, and to the usual sampling 
errors. 

By the usual least-square theory, the sampling variance of bi is 

(11) *&! = Mo* 2 , 

where /* is the cofactor of the central element in a, divided by a, that is, 

(12) fx = (ao0 4 - a|)/a. 

Since n is of the order of g~ 2 1 we may reduce the sampling variance as much as 
we please by taking the values of x sufficiently far removed from f. If f(x) is 
definitely known to be only of the second degree, a wide dispersion of the desir¬ 
able values of x is thus indicated, since in this case E 8 bi = 0, as appears by tak¬ 
ing the expectation of each term in (7). But if, as will usually be the case, 
f(x) has terms of higher orders than the second, an excessively wide dispersion 
may increase the bias Ehb\ to such an extent as to render the quadratic approxi¬ 
mation inapplicable. 

In taking the expectation of each term of (7) and then solving for Ebb\ we 
obtain, since Ey k — 0 according to the definition of y k , and because EA = 0, 
a result of the form 

(13) E 8 bi = B z pz + + Bb0b + • • • . 

We shall call B z , Bi , and Bb respectively the cubic, quartic and quintic com¬ 
ponents of the bias, or simply biases. If we denote by X, /*, v, the ratios to a 
of the cofactors of the second column of a, so that 

(14) \ai -f* fio^ + volb = 1, 
we shall have for the components of bias, 

B z = Xa 8 + 1 ACL 4 + vclb 

(15) Bi = Xa 4 + /jag + vclb 

Bb = Xag + ndz + va* j 

and so forth. Since X, /*, and v are of respective orders —1, —2 and —3 in a 
multiplier g of all the values of x, B z is of order 2, £ 4 is of order 3, and the higher 
biases are of higher orders. Thus if we begin with any particular distribution 
of x and apply a sufficiently small multiplier g f we can make the quartic bias 
negligible in comparison with the cubic, the quintic in comparison with the 
quartic, and so forth, provided none of these biases is zero. But in reducing 
g we increase the sampling variance, which is of the order of gT 2 . 

Under these conditions it is reasonable to consider what types of distribution 
having a fixed value of the sampling variance make the cubic bias a minimum 
in absolute value; then if there is more than one distribution of this kind, to 
seek among them a class minimizing the absolute value of the total of cubic and 
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quartic biases; and among these a class minimizing the absolute; value of the 
total of cubic, quartic and quintic biases, with the modified meaning; of the 
quintic bias taking account of Sb%. 


3. The cubic and quartic biases. We find, somewhat unexpectedly, that 
there exists a class of distributions of x for which the cubic bias is actually zero. 
To exemplify this we need give the variable no more than three different values, 
which we may call x, y and z, and we may assign to them the arbitrary fre¬ 
quencies k, m, n of experiments (k + m + n — N). If we put 


1 1 1 

(16) P = x y z =(x- y){y - z)(z - x), 

2 2 2 
X y z 

and consider a matrix of three rows and N columns, of which k columns are 
identical with the first column of P f m with the second, and n with the third, 
it is evident that the sum of the squares of the three-rowed determinants in 
this matrix is kmnP 2 . But this sum of squares is also equal to the determinant 
formed from the sums of products of the three rows, and this is a (formula (8)). 
Thus a = kmnP* 0, since x , y, z are all different. Together with the fore¬ 
going 3 X N matrix consider another, 


(17) 


1. 1 


having k columns identical with that first written, m identical with the second 
written, and n identical with the third. The only non-vanishing three-rowed 
determinants in this matrix are formed of these three different columns, and 
equal (xy + yz + zx)P ; there are kmn of them. The sum of products of cor¬ 
responding three-rowed determinants in the two matrices is therefore 
kmnF^ixy + yz + zx ). But this sum is also equal to the determinant, formed 
from the sums of products of corresponding rows, 


Oo os a 8 

d\ dz di 
dz di dz 

which, by (15), equals — aBt. It follows that 

(18) -B t = xy + yz + zx. 
There are many real solutions of the equation 

(19) xy + yz + zx — 0, 
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with the three values all different, for example — 2,3,6. If we assign such values 
to our variable, and an arbitrary number of experimental determinations to each 
of these values, the cubic bias B% will be zero. 

It will be noticed that such a solution cannot have zero for one of the values. 
If, for example, z = 0 in (19), then x or y must also vanish, in violation of the 
condition that there must be at least three distinct values. Moreover a solu¬ 
tion cannot be symmetrical about zero; if x + y = 0 it follows from (19) that 
x = y = 0. A solution may or may not be symmetrical about a value other 
than zero. The values 3 — 2 \/3, (3 — \/3)/2, \/3 satisfy the equation and 
are in arithmetic progression, while the solution —2, 3, 6 is asymmetrical. 

If we modify (17) by replacing the cubes of the variables by their fourth 
powers, and apply the same procedure to the modified matrix, we find that 

(20) Ba = -(x + y)(y + z)(z + x). 

Thus there exist sets of three distinct real values making the quartic bias vanish, 
for example any set for which x + y = 0; but no such set can at the same time 
nullify the cubic bias (18). Since it is ordinarily more important for the cubic 
than for the quartic bias to vanish, distributions nullifying (20) are not in 
general to be recommended. But in exceptional cases it may be known that 
ft is zero, or very small in comparison with ft, and then the vanishing of Bi 
is a more valuable property than that of B 8 . It will be shown that no distribu¬ 
tion of three or more values exists such that both the cubic and quartic com¬ 
ponents of bias are zero. 

Let us denote by D p the p-rowed determinant having a*+,_ 2 as the element 
in its ith row and jth column. Thus D s is the same determinant which we have 
in (8) called a, and 


( 21 ) 


0o 

0i 

Oi 

a 8 

01 

02 

08 

04 

Oi 

08 

04 

06 

a$ 

04 

06 

06 


For every distribution, every D p > 0; and a necessary and sufficient condition 
that a distribution have p or more distinct values is that D p be greater than 
zero. [7, p. 362]. If D p is positive, so is each of its principal minors. In 
particular, since we are requiring at least three values in a distribution, D 8 = 
a > 0, and therefore 

0204 — a\ > 0, 


( 22 ) 

and 

(23) 


0004 — 02 > 0. 
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We shall now consider distributions for which the cubic bias B* is zero, and 
consequently, by (15), 

(24) Xo 8 + fxai + v<h * 0, 

and expand L> 4 . From the definition of X, n f v ) we have 

(25) Xo* + fjuiz + va A = 0. 

Multiply the last row of the determinant (21) by and add to it X times the 
second row and p times the third. The last row is thus, by (14), (25), (24) 
and (15) transformed into 

1 0 0 Bi, 

while the determinant has been multiplied by v . Let this new determinant be 
expanded with respect to its last row. The cofactor of the first element 1 is 


ai 

at 

Os 

1! 

1 

£ 

ai 

04 

Uj 

o 4 

Ofi 


Let the last row of this determinant be multiplied by v ) an operation having 
the effect of multiplying the whole determinant by v; and let X times the first 
row and p times the second row then be added to the last. The last row is 
thus, by (14), (25) and (24) reduced to 

1 0 0. 


Hence 

vG = -(o2a 4 - a\) } 

and consequently 

v 2 D a = v(aB A + (?) 

(26) , 2 

= VdBi — ((Z 1 U 4 — (Zj). 

Since the first member of this equation is positive or zero, (22) shows that it is 
impossible that B A should equal zero when = 0 as we have assumed. That is, 
Either the cubic or the quartic bias of every distribution having three or more distinct 
values must be different from zero . 

If v were zero, (26) would contradict (22). Hence v ^ 0. With every dis¬ 
tribution of x there is associated another obtained from it by changing the sign 
of each value of x . Such a pair of distributions we shall call opposite . When 
we pass from a distribution to its opposite, the power-sums a* remain un¬ 
changed when k is even and change only in sign when k is odd. Since a is 
always positive, and since 

(27) 


V = (<Zi02 — OoUs)/u, 



30 


HAROLD HOTELLING 


v has opposite signs and the same absolute value for opposite distributions. 
The conclusions to be reached shortly will be equally valid for a distribution 
and its opposite, and in reaching them we may assume v > 0. It will then 
follow from (22) and (26) that B 4 > 0. 

4. Distributions nullifying cubic bias with minimum quartic bias. We can 

now prove the following theorem: 

Among distributions for which the cubic bias vanishes and the standard error of 
bi has a fixed value , those for which the quartic bias is a minimum have exactly 
three distinct values of the variable. These values satisfy the equation 

(28) xy + yz + zx = 0. 

Since the standard error a of a single observation is not affected by the dis¬ 
tribution chosen for x, fixation of the standard error of bi is equivalent by (11) 
to fixation of the value of the expression given by (12). We suppose therefore 
that y has some fixed positive value and that Bz = 0. Since n, Bz and B 4 do 
not involve the distribution of x excepting through the power-sums do, di, • • • , 
o 6 , we may treat these power-sums as the independent variables in trying to 
make B 4 a minimum. Their region of variation is limited by the inequalities 
referred to in the preceding section, 

D x * oo > 0 , D 2 > 0 , D 8 = a > 0 , D 4 > 0 . 

The inequalities D v > 0 for p > 4 involve power-sums of orders higher than 
the sixth and arc irrelevant to our purpose. 

The definition (8) of a shows that it is independent of a B and at ; consequently 
X, n, and v are also. According to (15), Bz involves a B but not ae ; while of all 
the expressions we have considered, only B 4 and D 4 are functions of at . There¬ 
fore when do, a x , • • • , a B are given any definite values, at may be chosen to 
make B 4 a minimum without any regard to the fixed values of u and Bz . Now 
(15) shows that B 4 is a linear function of at with a coefficient which, at the end 
of the last section, we have proved not to be zero and assumed positive. Thus 
B 4 , which is also positive, is an increasing function of at . Its minimum will 
correspond to the least value of at consistent with the condition D 4 > 0. But 
(21) shows that D 4 is also a positive linear function of at with a positive coeffi¬ 
cient, a. The minimum of a«, and therefore that of B 4 , require therefore 
that D 4 = 0. But D 4 = 0 is exactly the condition that there should be no more 
than three distinct values in the distribution. Since there must be at least 
three distinct values, and since if there are only three they must satisfy (19), 
the theorem is proved. 

The minimum value of B 4 with respect to variations of at when Bz = 0 may 
be found by putting D 4 = 0 in (26). Designating this minimum by b and 
using (27) we have 

2 

02 ou — o* 

Oidj — Oods ' 


(29) 
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where the numerator is intrinsically positive, and the denominator is positive 
for the class of distributions we are now considering, though we might equally 
well consider the opposite distributions, for which it is negative. We have also 
from (20), 

(30) (x + y)(y + *)(* + x) = —b. 

Substituting for each of these binomials its value as given by (28), we may write 
this in the simpler form 

(31) xyz = b > 0. 

It was shown at the beginning of section 3 that when there are only three 
values in the distribution, with frequencies k for x, m for y, and » for z, 

(32) a = kmnP* = kmn(x — y) 2 (y — z) 2 (z — x) 2 . 

The first two rows of (17) form a matrix such that the sum of the squares of 
its two-rowed determinants is 

(33) mn{y 2 - z 2 ) 2 + nk(z 2 - x 2 ) 2 + km(x 2 - y 2 )\ 

Since this is equal to the determinant of the sums of products of the rows, 
namely 

flo Qq 
at fl4 

it follows from (12), (32) and (33) that 

(34) u - <y + «)* I (* + «)* _+ (* + k) ! 

k(x — y) 2 (x — z) 2 m(x — y) 2 {y — zj 2 n(x — z) 2 {y — z) 2 ’ 

It is desired to minimize this expression, which is the factor of the variance 
that is independent of the accuracy of the individual observations, while hold¬ 
ing b = xyz fixed; or to minimize b while holding y fixed. In either case the 
values of x , y and z are to be chosen to satisfy (28). The relations established 
by the solution of either of these virtually equivalent problems will fix x } y, and z 
except for a factor of proportionality, which must then be adjusted to provide 
a balance as satisfactory as possible between random errors and bias. 

5. The quintic bias. Effect of 6 b%. With arty distribution determined in 
this way will be associated its opposite distribution, which will have the same 
minimizing properties so far as the variance and the cubic and quartic com¬ 
ponents of bias are concerned. The appropriate choice between these two op¬ 
posite distributions will in general involve the quintic component of the bias. 
At this point we must, for the first time, take account of the errors in the de¬ 
nominator bt of xo. 

Since bi converges stochastically to Ebi , and 6* to Eb %, the error xq — — J6i/bf 
converges stochastically (for large samples) to — \Eb\/E\h . By keeping our 
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values of x close enough to £ we may insure that Eb% differs as little as we please 
from ft, and hence that the series 

Ebi __ Ebi _ Ebi Ehbi , (Ebb^ 

Eb% ft + Ebb% ft \ ft /Sj 

converges rapidly. Let us rearrange this series after inserting for Ebi and Ebb* 
their values, so as to obtain a series in ascending powers of a common multi¬ 
plier g which may be applied to the values of x . We recall that in the expression 
(13) for Eb\ , Bi is of the second order in g y Bi is of the third order, ft is of the 
fourth order, and so forth. In the same way, we find that 


where 


Ebb* = Csft + Cifii + • • • , 


| Oo fli as | 

„ 1 

ts = “ fll On 04 

a 

I as a» o*l 

is of the first order, C* is of the second order, and so forth. Thus in 

ft= ftft + (ftft - ftC*0!/ft) 

+ (ftft ft(? 3 ftft/ft — ftC4jftft/ft + BsCzffl/Pi) + • • • , 

the first term is of the second order, those in the first parentheses are of the 
third order, those in the second parentheses are of fourth order, and the re¬ 
maining terms are of higher orders. 

We have seen that we can choose distributions for which ft = 0. In this 
way we get rid of the second-order term and reduce the third-order terms to 
ftft. We shall in the next two sections show how, under various conditions, 
to select from among the distributions for which ft = 0 an opposite pair for 
each of which | ft | is a minimum. In choosing between these two opposite 
distributions, the criterion we shall adopt is that the terms of third order and 
those of fourth order shall have opposite signs; for while the fourth-order terms 
may be made much smaller than those of third order in absolute value, still it 
is desirable that they should offset them, in order to reduce the error. The 
terms of third and of fourth orders reduce respectively for ft = 0 to ftft and 
to ftft — ftCsftft/ft . Our criterion is that these are to have opposite signs, 
and consequently that 

ftftft(ftftft ~ ftftftft) < 0. 

We shall however modify this criterion whenever a is not negligibly small. 
A more precise criterion will be obtained by expanding x 0 2 in a series of powers 
of 6 b 2 , taking the expectation term by term, and reducing the moments thus 
obtained of orders higher than the second to those of first and second orders by 
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means of the theory of the bivariate normal distribution of bi and bt. It is 
then necessary to make some assumption regarding the order of magnitude of 
x, y and z relatively to N in order to assemble terms of like magnitude in a 
criterion resembling that above but involving a. The appropriate balance in¬ 
dicated by the results of the next two sections calls for x, y and z to be of the 
order of N~ lls . This leads to the following criterion: 

— B\C$sfi\ — CsjSjMv 1 ) < 0. 

We have seen that B t = b = xyz. To evaluate Ct and 5,, which latter 
may in accordance with (15) be written 

do CLi CLB 
Bi = —— a\ dz at 

a 

02 <h 

we proceed as in section 3, replacing the second row of (17) by the first 
powers to obtain C,, and replacing the third row of (17) by the fifth powers 
of x, y and z to obtain Bi . In this way we find 


1 

1 

l 

1 

l 

l 

X 

y 

z f Bi = — p 

x 2 

y 2 

z s 

3 

, 

3 
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i 

6 
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Letting Si, Zxyz, etc. stand for the symmetric functions of x, y and z of which 
one term is written in each case after S, we may reduce these expressions to 

Ci — Si, 

Bt = —Si *y — Si ! j/ 2 — 2Si 2 yz. 

With the help of (28) and (31) we find 

Si i yz — xyzZx = 6Si, 

SiV = (Si yf — 2Si *yz = — 26Si, 

Si V = SiySi 2 — Sifyz = —bSx. 

Therefore Bi = 6Si. Substituting these values for B t , Ct and Bt in the 
last inequality gives the rule: 

Choose that one of a pair of opposite distributions for which 

(35) (i + y + z)/ StWBMt - (8,0 4 ) - 0, m * 2 1 < 0. 

It will be remembered that 0, is negative for a maximum of /(i), positive for a 
mi n imum . The other 0’s can only be estimated from preliminary experimen¬ 
tation, or possibly in particular cases from general knowledge or theory. 
Quite different algebraic methods are appropriate to minimizing p with a fixed 
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6 according to the limitations to be placed on the frequencies k, m, n; the meth¬ 
ods leading very simply to a solution in one case involve troublesome complica¬ 
tions in another. We shall deal with two of the leading cases. 


6. The case of equal frequencies. Some experimental situations call for equal 
frequencies for all values of the variable. If k =* m —n, then ao = N = 3n. 
Let a'j = o,/n. Then ai = 3 and a[ = 2x. Inasmuch as 

(36) 2xy = 0 and xyz = b, 

we may express ai , ai and ai as functions of a[ and b as follows: 

ai — 2x 2 = (2x) 2 — 22xy = a*. 

a' t = 2x* = (2x) 3 - 32x s j/ - 6 xyz; 

and since 2 x 2 y — 2x2xy — 3 xyz we have from (36), 

Oj = 0 -i + 36. 

We have also 

ai = 2x 4 = (2x) 4 - 42xV - 62xV - 122xV, 

and since 

2x*y = 2xy2x* — 2 xfyz, 2 x*yz = xyz2x = a[b, 

2x 2 y 2 = (2xy) 2 — 22x*j /z = —2a[b, 

it follows that 
Therefore 


a = n 


Upon subtracting a[ times the second column from the third, and a[ times the 
first from the second, this becomes 


/ 

a 4 

= a{* + 4 a[b. 

3 

t 

ai 

n 

<Zi 

f 

ai 

'2 

di 

d\ -t* 36 

' 2 

0i 

a? + 3b 

ai 4 + 4 a[ 6 


a = n z b 


Also, 

OoOi — 

Hence, by (12), 


3 

—2a( 
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a x 
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3 

= — n*6(4a(* + 276), 

a? 
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1 + 4aj 
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- (ai 2 )*} = 2n*(a| 4 + 
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Differentiating with respect to a[ to find a minimum, we obtain 

o = (4a? + 27b) (4a? + 6b) - 12a?(a? + 6a(b) - 4a? + 00a? + 162b*. 

The minimum of n, for b fixed, and satisfying the condition 4a? + 27b < 0, 
which is equivalent to a > 0 since we assume b > 0, is attained when a? = bq 
where q is the numerically greater root of the equation 4+ 60g + 162 = 0; 
that is, 

q = -(15 + V63/2) - -11.468 626 97. 

The elementary symmetric functions of the values x, y, z composing the dis¬ 
tribution are 

Xx — a[ — ( bq) ut , Xxy = 0, xyz = b. 

Hence x, y and z must be the roots of the equation in u, 

(38) u* - (bqf'V - b = 0. 

If we put u — (bq) in v, 

v 8 - + q- 1 = 0. 

Calculation gives approximately 

o -1 = — .087 194 396, and for the roots of the equation in v, 

(39) 

.2628, -.3729, -.8899, 

numbers which are therefore proportional to the values of the variable that 
should be chosen when the frequencies must be equal. If any values x, y, z 
proportional to these are used, the value (37) of n is 


(40) 


6 _g _+_ 6 
N4q+'27 q 


and is the minimum consistent with any fixed value b of xyz. 

Choice of the factor of proportionality will involve a compromise between 
the criteria of minimum sampling variance and minimum bias. If we ignore 
components of bias of orders higher than the fourth and recall (10) and (11) it 
will appear that the appropriate combined criterion is that 

(41) b*|8j + m<7 2 


shall be a minimum, 
with respect to b gives 


Putting for n its value n' from (40) and differentiating 


qo»h -I- 4<rV /8 Q + 6 i—s/3 _ 

2 ^ b + ~ir 4^+27 b -°’ 


or 


b - b' 


/_ 2ff* q ± 6 mV' 8 
V N0\ 4q + 27 q ) ‘ 
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The product of the three roots (39) is — q~ l . Numbers proportional to them and 
having the product b' will be obtained by multiplying them by — (b'q) 11 *, that 
is, by 



2.3318 



Multiplying (39) by 2.3318 gives numbers 


(42) 


.6128, -.8695, -2.0751, 


which must still be multiplied by =fc [<rV(AW<] 1/8 to give the set minimizing 
ESbi . The ambiguous sign is to be fixed according to the rule at the end of 
the last section. Thus we arrive finally at the conclusion: 

If the numbers of observations are required to be the same for all the values of the 
variable used, these values should for greatest efficiency deviate from the estimated 
maximizing value by the products of the three numbers (42) by 


(43) 



choosing the ambiguous sign so as to satisfy (35). 

The product b’ of the three values is to be substituted for b in (40) and (35), 
and the value of y thus obtained from (40) is also to be substituted in (35). 
These substitutions yield 

(^ + y + z)0204 (0206 — 40304) < 0 

as the criterion for choosing the sign in (43). 

The expectation of the square of the error in the estimate of the value x 0 of £ 
is, according to (9) and (10), given approximately by the ratio of (41) to 402, 
and it is this that will be a minimum when the foregoing rule is followed. The 
minimum of (41) is obtained by replacing b by b' in (40) and (41), and sub¬ 
stituting (40) for y in (41). This gives 

that is, 

(44) E(5bd 2 = 4.889 JV"* /4 01V' 2 . 


7. Adjustable frequencies. If the total number N of observations to be made 
can be distributed freely among the values of the variable, the efficiency of the 
experiment can be increased by a proper selection of the individual frequencies 
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k, m, n along with the corresponding values x, y, z. We shall choose these 
six unknowns, subject to the three conditions 4 

(45) k + m + n = N, 

(46) xy + xz + yz ■» 0, 

(47) xyz = -b, 

to minimize p. The last condition fixes the quartic bias, the preceding one ex¬ 
presses the vanishing of the cubic bias. It is of course understood that k, m, n 
are all positive, and we shall, as before, suppose initially that b is positive. No 
two of x, y, z can be equal, and it follows that none of them, or of the sums 

of two of them, can be zero while satisfying the second condition. We shall 

lose no generality in assuming that 

(48) x > y > 0 > z. 

Furthermore, it is easy to see that x + y, x + z, and y + z are all positive. 

Therefore the quantities 


aq) r _ V + z , = _ 

K * (x — y)(x — z) * (x-*)<*-*}’ 

are all positive. From (34) we have 


x + z 


t = 


x + y 


(x - z)(y - z) 


(50) 


2 2 .2 
r s t 

P = f H-r - • 

k m n 


The values of k, m, n making this a minimum while themselves subject to the 
limitation that their sum is N must if they were continuous positive variables 
be proportional to r, s and t. Of course the frequencies are integers, but we are 
supposing N large, so that the values found by differentiation will be close 
approximations, and we shall disregard this complication. Put therefore 

(51) r = kp, s — mp, t = np, 

where p is a multiplier which evidently is not zero. If we use these equations 
to eliminate r, s, l from p we obtain, with the help of (45), p = Np 2 . But if we 
use them to eliminate k, m, n from (50) we have instead, 

p = ( r + s + Op- 

Now from (49), 

(52) r + t = s, 


4 The condition (47) is here used instead of (31), from which it differs by the introduction 
of the negative sign, because it simplifies the argument of this section slightly to have the 
quantities (49) positive. There is no essential difference, since we are seeking a pair of 
opposite distributions. 
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so that p — 2 8p. Therefore Np = 2«, and finally p » 4 s t /N. Therefore p is 
a minimum when the positive quantity « is a minimum. In the expression 
(49) for 8 we substitute from (46) and (47) 

x + z - -xz/y = b/y i , 

(53) 

(x - y)(y - z) = (x + z)y - xz - y « 2b/y - y, 

so that 


' ' y(2b-y*Y 

Since y, 8 and b are positive, this shows that y < 26. The value of y on the 
interval from 0 to 26 making s a minimum is found by differentiation to be 


Substituting this in (63) and (47) gives 
X + z - 2 ill b m , xz = - 


_ 0 m h Vt 


whence 


(55) x - (6/2) 1/3 (l + >/3), V « (6/2) 1/s , 2 = (6/2) u, (l - V8). 

From (45), (51) and (52) it is seen that k + n = m — N/2. Thus half the total 
observations are to be concentrated on the middle value. From (51) and (49) 
we have also 

I 2 2 

k _ r _ y - z 
n t x 2 — y 2 ’ 


wherefore 


With (55) this shows that 


Nx* -y 2 
U 2 x 2 - « 2 ’ 


k - N(2 - V§)/8 m = AT/2, n - W(2 + V3)8 


.03349 N, 


- .46651 W. 


We have seen that p = 4s J /W. Substituting in (54) the value found for y 
gives 8 = 2 4/, 6 _1/, /3. Therefore the minimum of p for a fixed value of 6 is 

(57) p - (16/9JV) (2/6) a/ *. 

Inserting this in the expression (41) for the total expectation of the squared 
error and then differentiating with respect to 6 gives 


(58) 


6 - 2 m 3- m N- t W li . 
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When this value is given to b, (41) becomes 

(59) 3.8207J\r ,/ Vi / V / *. 

The greater efficiency of experiments with the frequencies (56) and the corre¬ 
spondingly adjusted values x, y, z, in comparison with the case in which the 
frequencies must be equal, corresponds to the smaller coefficient in (59) than in 
(44). To obtain as great accuracy with equal frequencies as with adjusted 
ones it is necessary to have more observations, in a ratio obtained by equating 

(59) with (44) after inserting different symbols for N in the two cases. In this 
way it is found that the number of observations required with efficient distribu¬ 
tion of the frequencies is almost exactly 72 per cent of the number required 
when the frequencies are equal, if the values x, y, z are in each case given their 
most efficient values. 

Substituting (58) in (55) gives the numbers 

(60) 2.1520, .7877, -.2110, 

multiplied by (43), with a change of signs if necessary to satisfy (35), as the 
values x, y, z of the variable to be used. The more concentrated character of 
this distribution with adjustable frequencies is emphasized by the small propor¬ 
tion, less than 3J per cent, of the frequencies (56) that pertains to the value most 
remote from the tentative maximizing value. 

When (58) is substituted in (57) and, with the result, in (35), this inequality 
reduces to exactly the same form as that obtained in the preceding section for 
fixing the sign of (43). 

8 . Introduction to the two-variable problem. Functions of two or more 
variables are of greater practical importance than functions of one variable. 
The recent work on factorial experiments [8] makes it clear that in the experi¬ 
mental determination of maxima of functions of several variables, considerable 
improvements are possible over the practice of trying the effect of variations 
in only one variable at a time while holding the others constant. It seems likely 
that the methods worked out in the previous sections for experimenting with 
one variable are capable of generalization. However certain difficulties enter 
which have not yet been surmounted. The object of the present section is to 
indicate something of the nature of the problem of extending the foregoing 
results to two variables, x and y. 

Let us suppose that a quadratic regression equation, 

Z — &oo + b\ox + boiy + i(bjo£ 2 + 2 bnxy + 6o*y 2 ), 

will be fitted by least squares to observations of z = f(x, y) based on N combina¬ 
tions of x and y, each of which represents a point in a plane. Since there are 
six coefficients to be determined, there must be at least six distinct points 
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(xi , j/i), • • • , (x 6 , yo). The coefficients in the normal equations may be written 
a } k = Sx 3 y k , so that aoo = N. The determinant 


floo 

010 

O01 

O20 

Oil 

O02 

aio 

O20 

Oil 

080 

021 

O12 

aoi 

Oil 

O02 

O21 

Oil 

008 

O 20 

080 

021 

O40 

081 

022 

On 

021 

012 

081 

022 

Oi 8 

O02 

O12 

O08 

O22 

Ol 8 

O04 


must not vanish. Let the function under investigation be 

f(x, y ) = ZWjkxtf/ti + *)!, 

and suppose that #10 = 0 = #01 , so that the origin is the point sought at which 
the first derivatives vanish. We shall assume that 

# = #20002 ~ #11 > 0 , #20 < 0 , 

implying a definite maximum. The estimates x 0 , y 0 of the maximizing (or 
minimizing) values obtained by differentiating Z are 

X 0 = (6ll6oi — 602610) /by 1/0 = (6ll6l0 — 620601) /by 

where 

6 = 620602 ~ 6 n . 

For large samples and values of x and y taken not too far from the origin, 6 will 
approximate to #, and x 0 and yo respectively to 

(#11601 — #02610)/#, (#11610 — #20601)/#. 


Some means is needed of combining into one the two desiderata of minimizing 
the errors xo and yo . A combined measure of these deviations is 

#20 xl + 2#hXo2/o + #022/0 • 

This expression is constant except for terms of higher order when Xo and yo , 
while remaining small, vary in such a way that f(x , y) maintains a constant 
value. Substituting in it the approximate values of x 0 and yo gives #~ 1 times 

#02610 — 2#n6io6oi + #20601. 


The expectation of this measure of error may be separated into two parts by 
means of the formulae for the variances and covariance, 

<r * 10 = Eb\o — (2?6io) 2 , g’&io&oi “ *^610601 — (Ebio)(Eboi), etc. 
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One of these parts is a generalized sampling variance, 

002tfbio “ 2011<T6 1O 6 O1 + 02O<T6 oi , 

and tends to zero with order AT" 1 as N increases provided the values (xu , yk) are 
fixed. The other part, 

(61) PoziEbio) 2 — 2Pn{Ebw)(Eboi) + 02 o(i?&oi) 2 , 

is a bias which does not tend to zero as N increases, but which may be kept 
arbitrarily small, at the expense of the sampling variance, by restricting the 
values (x* , yk) to be sufficiently small. This expression is a negative definite 
quadratic form in Eb xo and Eb 0 i , and therefore cannot be zero unless both these 
components of bias vanish separately. 

We may proceed as in paragraph 2 to express Ebio and Eb 0 1 in terms of the 
coefficients of /(x, y) of orders higher than the second, among which those of 
third order will be of leading importance. In this way it may be shown that, 
if we neglect terms in /(x, y) of orders higher than the third, Eb i0 and Eb 0 i are 
given by the ratios to a constant multiple of a of determinants obtained from 
a by replacing respectively the second and the third columns by the column 

030030 + 3021021 + 3012012 + 003008 

030040 + 3021031 + 3012022 + 008018 

030031 + 3021022 + 3012013 + 003004 

030060 + 3021041 + 3012032 + 003023 

030041 + 3021032 + 3012023 + 003014 

030032 + 3021023 + 3012014 + 003006 . 

It is desirable to select a distribution of points (x* , y k ) such that these compo¬ 
nents of bias will vanish, no matter what, may be the values of 0ao, 02 i, 0 i 2 and 
003. For this it is necessary and sufficient that all the determinants vanish 
that are obtained from these two by replacing the column written above by the 
terms in it that multiply any one of the four 0,Vs. The single-variable analogy 
suggests using a distribution having the smallest possible number of points, 
which in this case is six. Let us now take N = 6. The eight determinants will 
all be multiples of 


1 

Xi 

yi 

x{ 

xiyi 

y\ 

p = 1 

X2 

Vi 

x\ 

x*y% 

y\ 

1 

X 6 


x\ 

zei/e 

v\ 


To save space we shall indicate determinants of this character merely by writing 
a single row without subscripts, thus: 

P = | 1 x y x 2 xy y 
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If we'define 

A' ik = 11 *V y ** xy i/*|, 

^ = | 1 x xty x* xy y*\, 

and multiply each of these determinants for which j + k — 3 (j, & = 0,1, 2, 3) 
by P, columns by columns, we shall have exactly the determinants whose van¬ 
ishing is the condition for nullification of the cubic bias. If we multiply P by 
itself in the same way we have P 2 = a. Therefore P ^ 0. Therefore the re¬ 
quired condition is that the distribution satisfy the eight equations 


A» 

= 0, 

An 

H 

© 

A[ t 

« o, 

Aot 

A so 

= 0, 

As i 

= o, 

A^ 

= 0, 

A», 


and the inequality P 5 * 0. 

In seeking distributions nullifying the cubic bias we have twelve unknowns 
Xi , • • • , z $, yi , • • • , y% which must satisfy these eight equations. This sug¬ 
gests that we give arbitrary values to four of them and then solve for the other 
eight by straightforward elimination. Unfortunately, since the eight equations 
are each of the tenth degree, reducing to the ninth degree when coordinates 
of two of the points are given numerical values, a straightforward elimination 
would seem to lead to an equation of degree 9 8 « 43,046,711. The number 
of algebriac operations in performing the elimination, solving the equation for 
one of the unknowns, substituting back, and solving for the others, would be a 
large multiple of this number, and would doubtless be sufficient to occupy a 
large and efficient computing project for many millenniums. At the end of this 
period it might be found that the roots corresponding to the original arbitrary 
values chosen were all complex or made P = 0, and were therefore unusable. 
Thus indirect and less elementary methods are called for, and some qualitative 
investigations of such distributions, if they exist (which is not certain), are in 
order. 

The set of conditions as a whole is invariant under all non-singular homogene¬ 
ous linear transformations of x and y , as is easily proved by making linear 
combinations of the columns of each of the determinants A, A\ [4 and P, and 
by making linear combinations of these determinants themselves. These 
linear transformations leave the origin invariant. They have four degrees of 
freedom, which is exactly the right number to take care of the excess of un¬ 
knowns over equations. This points to the possible existence of a finite number 
of fundamental solutions, from which all solutions may be obtained by linear 
homogeneous transformations. Geometrical properties of the configuration will 
be represented by invariants under linear transformations. Thus the condition 
P ^ 0 means that the six points must not all lie on any conic section. From 
this it follows at once that no four of them can lie on a straight line, since this 
line, with the line through the other two, would constitute a degenerate conic. 
As a matter of fact, we can go further and prove that no three of the points 
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may lie on a straight line. In the proof of this and other properties of the dis¬ 
tribution it is convenient to use the arbitrariness provided by a linear trans¬ 
formation to pass the axes (which may be oblique) through any two of the six 
points, and then to adjust the scales of measurement so that the coordinates of 
these points become (1, 0) and (0, 1), except that one of them might conceivably 
be the origin. If three points are collinear, their line can be taken to be the 
a>axis if it passes through the origin, or the line y = 1 if it does not. Even with 
the help provided by such procedures the proofs are rather long, though straight¬ 
forward. We shall content ourselves here with stating, without proof, the fol¬ 
lowing properties necessary for sets of six points for which P ^ 0 and all com¬ 
ponents of the cubic bias vanish: 

No three of the points can lie on a straight line. 

No two straight lines through the origin can contain four of the points. 

No four of the points can lie on the vertices of a parallelogram. 

The set cannot consist of the origin and the vertices of a regular pentagon with 
center at the origin. 

These conditions have been established by calculations of a rather straight¬ 
forward and laborious sort, too long to be reproduced. 

If z k = x k + iyk and l k = x k — iyk , the conditions P^O, A'j k = 0 = Aj k , 
may be written 

I 1 z 2 z 2 z2 2 2 | 5*0, | 1 z ; 2* 2 z 2 z2 2 s | = 0, | 1 z z J 2* z 2 zz 2* | = 0. 

9. Some further unsolved problems. Since it is useful to demarcate the 
frontiers of knowledge by pointing out what lies a little outside them as well 
as what is within, a few of the many questions may be mentioned which this 
paper falls short of answering. Besides the extension to two variables men¬ 
tioned in the last section, and to an arbitrary number of variables, it is desirable 
that the whole theory should be developed from an exact, or small-sample, 
point of view rather than on the basis of the large-sample approximations used 
here. This however appears to be an extremely large enterprise. A simpler, 
but still quite difficult, problem is to modify the criteria obtained in paragraphs 
6 and 7 so as to fit problems of economic experimentation, such as those of 
determination of maximum monopoly profit or minimum cost, in which the cost 
of each observation consists largely of the lost profit, or excess cost over the 
minimum, occasioned by the deviation from the value sought. In such a case 
the limitation of cost replaces the limitation of the total number of observations. 

Another important problem is to take account of the inaccuracy of the pre¬ 
liminary information on which the design of the experiment is based, and to 
utilize the relations thus involved to design efficient sequences of experiments. 

Determination of limits of error in terms of the maxima over an interval of the 
derivatives of f(x) should be a fairly straightforward problem in analysis and 
have practical importance. With this are associated various problems dealing 
with maxima of functions having discontinuities in the first or higher derivatives 
at or near the maximum. 
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An important extension would deal with the case in which the maximum is 
estimated from a least-squares polynomial of degree three or more. This might 
be a connected with the difficult wider problem of deciding on the degree of a 
polynomial to be fitted in a particular case. 

10. Summary. In determining the value £ of x for which /(x) is a maximum 
or minimum, a quadratic polynomial may be fitted to observations made for 
chosen values of x. The errors considered are of two kinds: sampling errors 
resulting from the inaccuracy in each observation, which diminish as the number 
of observations is increased, but increase if the values of x are chosen too close 
to the value sought; and biased errors resulting from the fact that f(x) is not 
truly quadratic, which do not decrease when the number of observations in¬ 
creases with a fixed set of values of x, but do decrease when the deviations of x 
from the value sought are reduced. The biased errors may be separated into 
components corresponding to the third, fourth and higher powers of x ~ £ in 
the expansion of /(x), and these components will ordinarily be of diminishing 
importance as we go on in the sequence. However it is possible to choose values 
of x making the cubic component zero and the quartic component at the same 
time a minimum. Such a set consists of only three values of x. These values 
may be further adjusted to minimize the expectation of the square of the total 
error in £, as far at least as the term of fourth order in the bias, by a proper 
balance between the sampling variance and the quartic bias. The values of x 
satisfying these conditions, measured from the true maximizing or minimizing 
value £, are the products of [<r 2 /(N(A !)] 1/8 by the values u in the table below. 
Since the root will usually be extracted by logarithms, the common logarithms 
of the values are given. The first set are the most efficient when the frequencies 
must be equal. The second set is appropriate when the frequencies are made 
proportional to the quantities in the last column; in this case only about 72 per 
cent as many observations are required for any specified accuracy as when the 
frequencies must be equal. The approximate expected squared errors in the 
estimates of £ in the two cases are given respectively by formulae (44) and (59). 
All these results are approximations of the kind appropriate to large numbers of 
observations. 


Equal frequencies 

Adjustable frequencies 

u 

logio u 

u 

logiott 

Frequency 

-.6128 

-.21267 

-.2110 

-.67572 

.46 651 AT 

.8695 

-.06071 

.7877 

-.10364 

.50 000 N 

2.0751 

.31704 

2.1520 

.33284 

.03 349 N 


The signs of u should be reversed if fafa{fafa — 40 3 £ 4 ) > 0. Here fa is the 
coefficient of (x — £)* in the expansion of /(x), and a is the error variance of 
an individual observation. For designing an efficient experiment it is necessary 
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to have some knowledge of these quantities. It may be gained from preliminary 
experiments of smaller scale. 

A suitable preliminary experiment, where knowledge of the function is ex¬ 
tremely scanty, might consist of a fixed small number, greater than one, of ob¬ 
servations on f(x) corresponding to each of a set of six or more values of x in 
arithmetic progression covering an interval that includes the value £ sought, 
and selected with a view to getting £ in the center of it as nearly as possible. 
A polynomial of the fifth degree at least should be fitted by least squares, in 
which process all the quantities desired for the design of the later, larger experi¬ 
ment can be estimated, together with their accuracies. Since the values of x 
are taken in arithmetic progression, the fitting can be carried out with extreme 
ease by the method of orthogonal polynomials. 

Numerous subsidiary questions promise to have both practical importance 
and mathematical interest. 
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1. Introduction. The words “routine analyses” are used to denote the analy¬ 
ses performed by laboratories, frequently attached to industrial plants, and dis¬ 
tinguished by the following characteristics: (1) All the analyses or measure¬ 
ments are of the same kind, for example, are designed to measure the sugar 
content in beets or to determine the coordinate of a star. (2) The analyses are 
carried out day after day using the same methods and the same instruments. 
(3) While all the analyses are of the same kind, the quantity measured varies 
from time to time and each such quantity is measured repeatedly n times, 
where n represents some small number, 2, 3, 4, 5. 

As an illustration we may consider the routine analyses of sugar beets per¬ 
formed in the process of selection and breeding. A small section is cut out of 
each of a great number of sugar beets expected to be suitable for further breed¬ 
ing. It is crushed and its juice extracted to determine £, the sugar content of 
each particular beet. From the juice available from each beet n samples are 
taken and a determination of the sugar content is made from each. Thus, if 
& represents the sugar content of the section from the ith beet and there are 
N beets, the laboratory will have to make nN analyses with their results x iti , 
Xi, 2 , • • • , Xi, n , representing the measurements of the same quantity & . Ob¬ 
viously the sugar content f, referring to the zth beet need have no relation to 
that of any other jth beet. 

An essential point in the above description is that the number of measurements 
referring to the same quantity £» is usually very small. For example, the 
quantitative analyses of urine in certain clinics are performed only twice for 
each patient, so that n = 2. Frequently, various practical considerations make 
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it impossible to increase this number n of analyses intended to measure the same 
quantity £<. 

The smallness of n introduces difficulties in estiminating £<. It is usual to 
consider x<,i, x<, a , • • • , x<,„ as independent variables, varying normally about 
£i with an unknown standard error <r<. If they have to be used to estimate £,, 
then the confidence interval [l] 1 for will be determined by the familiar formula 

(1) Xi. - Sit„(n ) < £< < x<. + s,f«(n), 
where x,. denotes the mean of the x<,-, 

(2) s* = Z2 (*# ~ Xi.)*/n(n - 1) 

i-i 

and t a (n) is Fisher’s t corresponding to the number of degrees of freedom » — 1 
and to the chosen confidence coefficient a. It is known [2] that if the estimate 
of £> is based only on its direct measurements x,,i, x .-, 2 , , x<,», then the con¬ 

fidence interval (1) can not be made any smaller; in fact, formula (1) gives the 
shortest unbiased confidence interval for £,. But if we try to substitute appro¬ 
priate numbers in (1) we get disconcerting results. Namely, if n = 2 and 
a = .99, then t a (n) — 63.657. If n is increased, the value of <„(n) decreases 
rapidly but for n = 5 it is still very considerable, t a ( 5) = 4.604, and consequently 
the numerical confidence interval determined by (1) is frequently so broad that 
it is devoid of practical value. 

The general conclusion is that, if n cannot be increased, satisfactory estimates 
of £< can only be obtained when they are based on something else in addition to 
the direct measurements x,.i, x,-.», • • • , x,-.« . This point was first noticed by 
“Student” [3]. His method of avoiding the difficulty consists in assuming that 
the accuracy of measurements performed in the same laboratory is constant 
in time, so that ai = m — ■ ■ ■ — a N — a. If this is true, then sj = 2s*/N will 
be an unbiased estimate of the variance of x<,, based on N(n — 1) degrees of 
freedom. If the past experience of the laboratory is of any size, as measured 
by N, then the product N(n — 1) will be of considerable size and the confidence 
interval for £, 

(3) x,-. - Sot a (N(n - 1) + 1) < £< < x,-. + So<«(N(n - 1) + 1) 

will be much more satisfactory than (1). 

The problem which arises is whether we are entitled to assume that <j\ = 
<r, = ... = a* . The first study of this problem seems to have been made by 
Przyborowski [4] in a paper written in Polish. His findings, subsequently re¬ 
ported [5] in English, show that, at least in certain cases, the accuracy of routine 
analyses is quite difficult to keep constant. If it is not constant, then the rela¬ 
tive frequency of the cases where formula (3) gives correct statements about £,- 
will generally be different from the expected a. 


1 Figures in square brackets refer to the literature quoted at the end of the paper. 
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The procedure employed by Przyborowski to test whether <n = <r a = • • • = <tn 
consisted in considering the quantities Vi — (n — 1 )s* and applying the x 2 test 
to see whether they follow the same x 2 distribution with n — 1 degrees of freedom 

(4) p(v) « ce i(n “ 8) <^ w ' , 

with an unknown <r. 

Just this point is to be the main subject of this paper. The x 2 test was de¬ 
vised by Karl Pearson with no particular set of alternative hypotheses in view. 
As a result we may expect that in many cases other tests may be devised which 
would be more powerful. A number of such cases are already on record [6], 

m. f8j. 


2. Statistical hypothesis H to be tested. We shall consider the case where 
wc can observe the particular values of Nn random variables Xi tj , i = 1, 2, 
• • • , N; j = 1, 2, • • • , n, and we know that x it j Is independent of Xki for i ^ k 
and that 


(5) 


/ , ( 1 V 

p(*u, ={^ 7 %) e 


with unknown values of £, and <r, > 0. The hypothesis H to be tested is that 
a-] = <r 2 = • • • = (r N = cr without specifying, however, the actual value of a. 
It will be noticed that this hypothesis has already been treated by a number 
of authors [9]—[17]. The need for considering it again arises from the fact that 
previously it was tested against the set of alternatives presuming that the 01 , 
(7- 2 , • • • <Tfr , were positive constants having any values whatsoever. It seems 
to the author that, in the present case, the set of alternatives should be different. 
This will be explained in the next section. It follows that wdiile the hypothesis 
tested is the same as in the papers quoted above, the problem of testing it is 
quite different. 

Let us denote by E the whole set of Nn observable variables. If // is true 
then their elementary probability law r will be 


( 6 ) 


V(E\H) - 


/ 1 \ Nn 2 
ww 6 


3. General problem of similar regions. The development of the test will 
folknv the general lines explained elsewhere [18], [19], [20]. Denoting by W the 
Nn dimensional space of the £ tf /s, w T e w ant to determine a region w in W having 
the following properties: (a) if the hypothesis tested is true then the probability 
of E falling in w shall have some fixed value chosen in advance, e.g., e = .05 or 
e = .01. This probability is known as the probability of an error of the first 
kind, (b) If H is not true then the probability of E falling in w as determined 
by one of the alternative hypotheses (that we assume likely to be true when H 
is false) shall be as large as possible in a sense that requires further explanation. 
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The probability with which this condition is concerned is a complement of the 
probability of an error of the second kind. Once the region w is chosen it will 
be used to test H in this way: if E falls within w> then H will be rejected. 

In the present section we shall deal only with ways of satisfying condition 
(a). The problem is similar to the one recently described by Hotelling [21]. 
The difficulty is that, if H ls true, the probability law of E is given by (6) and 
contains N + 1 unspecified parameters, “nuisance” parameters as Hotelling 
very appropriately calls them. If we take just any region w then it is most 
likely that the probability of E falling in it will vary with different values of 
<r, £i , • • • , &v. As a matter of fact, if we want the test to be absolutely most 
powerful, or at least relatively so, we must determine not just one single region 
satisfying (a) but actually all such regions or some broad family of them. From 
these we shall then select one which seems most satisfactory from the point of 
view of (b). 

Systematic methods of determining regions of the above kind have already 
been considered [18], [20], [2]. In these publications they are called “similar” 
to the sample space W. The reason for this term is that the whole space W does 
possess the required properties with e = 1. In fact, whatever be the values of 
the nuisance parameters, <r, (i , the probability of E falling within W , 

as calculated from (0), is perfectly determined and equals 1. Our problem is 
to find a region w , part of W, with similar properties for 0 < e < 1. However, 
in many cases no such regions exist [22]. 

The general methods in the above publications are applicable in the present 
case. However, a recent paper by Cramer and Wold [23] allows a slight im¬ 
provement in presenting the matter. As this is a little involved, it seems de¬ 
sirable to take up the whole problem and present it anew. 

Consider then the general case where the probability law of some m observable 
variables t/i , y 2 , • * • , y m , say p(E | 0i, • • • , d M ), as specified by the hypothesis 
tested, depends on & nuisance parameters 6i , 02, • • • , 6, . Our problem will 
consist of determining the necessary and sufficient conditions for a region w to 
be similar to the sample space with respect to all these parameters. We shall 
assume that the probability law p(E | $i , • • • , 0„) satisfies certain limiting con¬ 
ditions. 

Let 


( 8 ) 


d log p 
~~ddi 


_ d 2 log p 

~ 1WmJ 


Assume that for all values of i and j = 1, 2, • • • , s 


( 9 ) 


(pii = Ai,f + B<,1,k<Pk 

1 


where the coefficients Aij and B ti7t * are independent of the observable variables 
E. Assume also that the probability law p(E \ $i, • • • , 0 t ) permits indefinite 
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differentiation under the sign of the integral taken over any fixed region w in W. 
It is easy to check that the probability law (6) satisfies all of these conditions. 

In order to find the necessary conditions for the region w to be similar to W 
with respect to 0i, 02 , • • • , 0,, assume that w is actually similar and that, conse¬ 
quently, 

(10) P{E € 10 10i, • • •, 0«} = J • • • J p(B 10i y • • • 0«) dyi • • • dy m * e 

for all possible values of 0i, 0 2 , • • • , 0*. It follows that the derivatives of all 
orders with respect to 0i, 0 2 , • • • , 0. taken from the left side of (10) must be 
identically equal to zero. But we have 


— J • • • J p(E 10i, • • •, 0.) dyi • • • dy, 


(11) 



• •, 0.) dyi • 

• • dy m 



= f ••• J wp(E \0i, •• 

•, 0,) dyi • • 

• dy m = 0 

for * = 1, 2, • 

, 5 . 

Similarly, using (9) 



a* f 

(12) d ° id0i 

•••/ 

p(E | 0i, • • • 0,) dyi • • • dy m 

% 




• J ypi<pj + Aij + 2Z Bi,j'k(pkj p(E 10i, • • * 

., 8.) dyi • • 

d 

ill 


Using (10) and (11), the last identity will be reduced to 


(13) - / • • • f <pi<pjp(E Ifij, ... 8.) dy!-- - dy m m —A t j for i,j = 1, 2, • • •, s 

where the right side does not depend on the particular region w, provided that 
w is similar to the sample space. Considering the identities (11) and (13) 
which were obtained by differentiating (10) twice, we may guess what will 
happen if we differentiate (13) again and again. We may assume, in fact, that, 
whatever be the non-negative integers fa , fa , • * * , fa , we shall obtain 

(14) - f ••• f n <p k i‘p(E\6x, • ••, 8.)dyi • • • dy m m M(ki, kt, • • •, k,), 

where M(ki , • • • , fa) is independent of the particular region w , provided that w 
is similar to the sample space with respect to all of the 6’ s. Assume that this is 

9 

found for all Ic’s such that 23 fa < K; also assume that the sum of the k’s in 

*-i 

(14) is exactly K. Differentiating with respect to 0, , we obtain 

If... f II + H (pi* 23 (pT 1 p(E 10i, • • •, 0*) dyi . • • dy m 

€ J Jw { i-1 »-l t-1 J 


(15) 


•••,*.). 
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Because of the particular form of <p t ,j, the second expression in the curly brackets 
under the integral is a polynomial in the (p’s of order not exceeding K. According 
to the assumption made, this expression multiplied by p(E | 6 i , • • • , 0 f )/« and 
integrated over w gives a result which is independent of w. As the right side of 

(15) is also independent of w f we conclude that 

-/•••/ W IT <£* p(E \ 0 i, • • •, 0 *) dyi • • • dy m 

(16) € J Jv > *-i 

be M (fti, • • •, fc/ + 1| • • • f k«) 

is also independent of the particular similar region chosen. We have seen that 
(14) is true for K < 2 and that if it is true for K it is true for K + 1, that is, 
it is true in general. 

We may now sum up our findings: if w is a region similar to the sample space 
with respect to all of the 6 ’s and if e denotes the value of the integral (10), then, 
whatever be the non-negative integers ki , k 2 , • • • , k a , the value of the integral 
on the left side of (14) is independent of the particular region w chosen. 

As the whole sample space W is also “similar” with € = 1, it must satisfy this 
identity. This allows us to determine the M% namely 

(17) II *■* p(E | $ 1 , • • •, 0.) dyi • •. dy m as M(ki, • ♦ •, k.). 

J * W t-1 

It is obvious that the necessarj' condition above is also sufficient. If a region 
w is such that (14) holds for all systems of non-negative integers then all the 
derivatives of (10) must be identically zero; thus the left side of (10) is inde¬ 
pendent of 0i, 62 , • • • , 6 $ . 

It will be useful to interpret the above conditions as follows. We start by 
noticing that the left side of (17) represents the product moment of some speci¬ 
fied order of the <pi , <^ 2 , • • • , <p$ considered as random variables. We shall call 
it the absolute product moment. We will now interpret the left side of (14) 
as a product moment also. For this purpose we shall define a new elementary 
probability law of the y’s to be denoted by p(E | w, 81 , • •. , 0 ,) and described 
as the relative probability law given w. We shall write it as 

(18) p(E | w 9 0i, • • •, St) = - p(E 10i, • • •, 0«) 

for all of the points E included in w and 

(19) p(E | w, 61 , • • • , S 9 ) « 0 

for all other points. With this definition the left side of (14) appears to be the 
expectation of the product <p\ l • • • <p k / calculated from the relative probability 
law of the y *s given w . We will call it the relative product moment given w. 
The final result can now be stated as follows: 

For a region w to be similar to the sample space with respect to 0i, 02, • • • , 6 , 
it is necessary and sufficient that all the relative moments and product moments 
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of <pi , & , • • • , p» shall equal the corresponding absolute moments and product 
moments. 

In order to make the method of constructing similar regions according to the 
above conditions clear we recall the procedure involved in the calculation of the 
probability laws of any given set of random variables. 

Assume then that the elementary probability law of the original variables is 
given. Fix some values of the parameters 0i, 0 2 , • * • , 0*, denote the resulting 
probability law by p(E ), and consider the problem of finding the elementary 
probability law of pi , <& , • * • , <p* considered as functions of the y’s. We shall 
assume that none of the p’s can be expressed as a function of the others not 
involving the y ’s explicitly so that the matrix 

dpi dpi dp\ 

tyi dy 2 dy n 

(20) 

1 

dp K dp t dpt 

K dyi dy 2 dy m 

is non-singular. In these circumstances it is possible to select m — s functions 
of the y ’s say ^,+ 1 , , • • • , which have continuous second derivatives such 

that the formulae 

Zi = pi i = 1, 2, . •. , s 

(21) 

Zi = J = s + 1, , m 

determine a one-to-one transformation of the space W of the y’s into the space 
W' of the z\ s. If w denotes any region in W then it will be transformed into a 
perfectly determined region w' in W\ If E' denotes a point in W' then the 
probability of E' falling in w' will be identical with that of E falling in w. Thus 

(22) P{E'tw'} =P{Etw\ = / ••• £ p(E)dyi ■■■ dy m . 

Letting J be the Jacobian of the y’s with respect to the z \s in the transformation 
(21) and using the known formulae for transforming multiple integrals, we have 

(23) P{E'*w'\ = / ... £ piE)^ \J\d Zl ... dz m , 

where p(E)] s > denotes the result of substituting the expressions for the y’s in 
terms of the z’s as obtained from (21) into p(E). It follows that, whatever be 
the region w' in W', the probability of E "s falling in it is obtained by integrating 
the function p(E)]f | J | over w'. But this means, according to the usual 
definition, that the product p(E)] K ’ | J | is the elementary probability law of 
the z’s. Denoting it by p(E') = p{z \, • • • , z n ) we have 

(24) p(E’) = p(E)] K , | J |. 



A STATISTICAL PROBLEM 


53 


Now, to obtain the joint probability law of <pi, or that of Zi, 

h , • • • , z, we must integrate p(E') for all the other z’s between their extreme 
limits, formally between — °o and + » for each of the variables concerned, 


(25) 


p(<pi 


,<p.) - r ■■■ ppiE 1 ) dz. +l 

J — 00 •'—00 


dz m . 


This procedure will be applied when calculating the absolute probability law 
of the ¥>’s and also the relative one given w. The only difference will be that in 
the latter case we shall have to start with (18) and (19) instead of the original 
probability law. The space W' and the transformation (21) will be the same 
in both cases. It is important to be clear about the difference between the two 
cases. This is connected with the difference between p(E | 0i, • • • , 0.) and 
p(E | w, 6i, • • • , 0,) of (18) and (19). The latter is proportional to the former 
at any point E within the region w but is zero outside of w. As mentioned 
above, the integrations for z,+i, z , +i , • • • , z m in (25) should extend formally 
from — oo to + * for each variable. However, the probability law p(E') may 
equal zero within certain parts of this range. Fixing any system of values 
Zi = <pi , for i — 1, 2, • • • , s, is equivalent to fixing a hypersurface in the space W 
and considering the intersection of planes z, = constant in the space W'. De¬ 
note them by W(<p) and W(<p), respectively. If we shift the point E or E' 
along W{<p) or W'(<p) respectively, the variables z,- - , for j — s + 1, 

s + 2, • • • , to will assume a certain set S(ip) of systems of values. When calcu¬ 
lating the absolute probability law of yn , • • • , <p, this set S(<p) will be the real 
region of integration in (25); outside of it the function under the integral sign 
will be zero. On the other hand, when calculating the relative probability law 
of <pi , • • • , <p, given w, the function under the integral (25) is zero as soon as 
the point E moves outside of the region w. Denote by w(<p) that part of W ( <p ) 
which is included in w and by w'(<p) the corresponding part of W'(<p). So, the 
absolute and the relative, given w, probability laws of <pi , • • • , <p. can be ob¬ 
tained by using the formulae 

(26) p(vn > • • * > <P«) = / • • * / p(E') dz.+i • • • dz m 

(27) pta, •••, = - f ••• / p(E')dz, +1 ■ dz m . 


Now the method of constructing regions similar to W with respect to 0 i , 
0 2 , • • • , 6 , is clear: to construct any such region it is necessary and sufficient 
to select for each of all possible systems of values of , <pt , • • • , <p. a part w(<p) 
of the hypersurface W(<p) and to combine all these parts. The selection of w(<p) 
is arbitrary save for the restriction that the probability law (27) have all its 
moments equal to those of (26), identically in the 0’s. This last condition will 
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certainly be satisfied if w(<p) is so selected that for almost all systems of values 

Of <pl , (f>2 , • • • , <Pa 

(28) p(<pi , • • • , <p, | w) ss p((p i, • • • , (pi) 
for all values of the 0’s. 

By selecting wfa) in all possible ways that satisfy (28) we obtain an infinity of 
regions similar to W with respect to $i , 0 2 , • • • , 0«. They form a family which 
we shall denote by F(e). However, it is known that in general all the moments 
of p{(pi , ... , <p a | w) and p(<pi , ... , <p g ) may be identical without the two proba¬ 
bility laws being equal almost everywhere. In such cases, the family F(e) will 
not exhaust all the similar regions. It is important to be able to state whether 
or not F(t) contains all the similar regions. To ascertain this we may use the 
conditions of Cram6r and Wold [23] which are sufficient for the determinateness 
of the problem of moments, that is, for the uniqueness of a function having a 
given set of moments. 

Let 

(29) * - M(v, 0,0, ... ,0) +M(0,.,0, ... ,0) + ... + M( 0,0, ... , 0, v). 

With this notation the conditions of Cramer and Wold can be stated as follows: 
If any two probability laws, c.g., the probability laws p(v?i, • • • , <p B | w) and 
p(<Pi , • • • , ^«)> have all their moments and all their product moments identical 
and if the series 

(30) 

9 

is divergent, then 

(31) Pfa ,•••,*. I w) = P(f>i, ■ • • , </>•) 
almost everywhere. 

Therefore, to know whether the family F(i) defined above exhausts all the 
regions similar to W, we must calculate the even moments of all the <pi and see 
whether the series (30) depending on these moments is divergent. If it is, there 
is no similar region besides the family F(e). Otherwise, there may be some 
others. These others will be constructed by selecting w(<p)’s such that the in¬ 
tegral (27) equals any other probability law having the same moments as (26). 
In such cases, a region w selected, in one way or another, from the family F( c) 
as the best from the point of view of controlling errors of the second kind will 
only be the relative best. 

It should be mentioned that whether we can always, under the conditions 
considered, select a w(<p) on any W(<p) that satisfies the identity (28) has not 
yet been proved. However, it seems plausible that the differential equations (9) 
imply the existence of a sufficient set of statistics for 0i, 0 2 , • * - , 0 § . If this is 
so, the possibility of satisfying (28) is guaranteed (see [2], p. 366). 
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4. Regions similar to the sample space with respect to a, £1 , 

We may now return to the original problem and apply our theory to the probar 
bility law (6). We wish to construct the most general regions similar to the 
sample space with respect to the nuisance parameters a, i i, • • • , unspecified 
by the hypothesis tested. We let 


(32) 




_ tJss __*•+>£ £ to-#, 

da a a* <-i ;-i 


(33) 

Then 


d log p 

* ~ 


nfe. - (t) 
a* 


with 



d^> r _ 3 2Nn 

da a^* a 3 


(34) 


d<p r 

dit 


= —2a<pi 



n 

a* 



i * j 


and we see that the probability law (6) satisfies the differential equations (9). 

Now the hypersurfaces W(<p) of the theory are the intersections of the hyper- 
surfaces 


(35) >p, = constant and <pt — constant, for i = 1, 2, • • • , N. 

The latter equations are clearly equivalent to 

(36) xt. — constant. 

As to the former, we notice the identity 

(37) 2 2 (*»•./ — it) 2 — n 2 GS< + (**• -r it) 2 ) = x*> ( sa y) 

»-l j'-l i-1 


n 

where nS\ = 2 (*•.» — x t-) 2 - 
>-1 

hypersurfaces (36) with, say, 


Therefore, W(^p) denotes the intersection of the 


(38) 


N 

Ti = 2 <S? = constant. 




If we succeed in selecting from each hypersurface W{tp) a part w(<p) satisfying 
condition (28) identically then the sum of all such regions w(<p) will form a 
region w similar to W with respect to all the unspecified parameters and belong¬ 
ing to the family F(t). Before proceeding to this stage of the solution, let us see 
whether the family F(e) exhausts all of the similar regions. 



56 


J. NEYMAN 


For this purpose notice first that instead of considering whether there is but 
one probability law with moments equal to those of <p ff and the <pi&, it is suffi¬ 
cient to concern ourselves with the moments of x and Xi. . In fact, all the ^s 
are functions of these variables and the problem of uniqueness of the distribution 
must have the same answer in both cases. The 2*>th absolute moment of \ 
as calculated from (6) equals 

(39) (2<7 2 ) 2 T(£2Vn + 2v)/T&Nn). 

The same order moment of x%. is 

(40) <r*\2v)\/(2 n)V. 


Thus, the quantity denoted by ju 2 „ in the theory becomes 


(41) 


M2* = 


(2cr 2 ^T(Wn + 2v) /<t 2 Y(2„)! 

r(iiVn) ^ \nj 2 v v\ ' 


We are interested in whether or not the series (30) is divergent. Since ^ satis¬ 
fies the inequality 

(42) w, < a 2 T(6 + 2v) = Ctf’, (say) 


with a = 2 o' + N and 26 = Nn, if wo prove that the series ZCV diverges, then 
(30) also diverges. To settle this conveniently we apply Stirling's formula to 
T (b + 2v) and find that, as v —► oo, the ratio C 2 „A~ 1 tends to a finite limit. As 
the series is divergent, so is the series £C 2 „ and thus the series 2 p 2 y l2v is 
divergent. Therefore, there is but one probability law with moments identical 
to those of x 2 and the and so the family F(e) contains all the regions similar 
to the sample space with respect to <r, £1 . 

It may now be interesting to go into some details of the effective construction 
of any region similar to W with respect to <r, £1 ,♦•*,£* . For this purpose it 
is convenient to go back and express the identity ( 28 ), that the regions wfo) 
must satisfy, in terms of the relative probability law of z s + 1 , z „+ 2 , • • • , z m given 
<Pt , <P2 , • • • , <P* • This is denoted by p(z*+i, z«+2, • • • , z m | <Pi , • • • , <£>*) and de¬ 
fined for every system of values of the <p ’s for which p(<pi ,&,--•,?*) ^ 0 ns 
follows: 


p(z M +1 j z »+2 i , z m | <Pl t <p2 y • * * j <P$) 

(43) 

= p(<Pl) • • • 1 Zt+1 , • • • , Zm)/p(,<Plj • • • , *p*)> 
Using (26), (27), and (43), the identity (28) can be rewritten in the following form 

(44) / • • • / p(zs+ i, • • •, Zm | <pi, •••,*>.) dz, + 1 ... dzm s e. 

J Jw'iv) 

The function under this integral is the relative elementary probability law 
of z,+ 1 , z t + 2 , • • • , z m and it is integrated over the region w'(<p). Therefore, the 
left side of (44) is nothing but the relative probability of the point E 9 falling in 
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w f {<p) given that the first s of its coordinates have the fixed values <p\ , , • • • , <p a . 
In other words, and owing to the one-to-one correspondance between the spaces 
W and W ', we have 

(45) P{E' 6 | E' € W'(<p )} - P{E € w(<p) | E e W(<p )} ■ €. 

Now the general method of determining similar regions may be stated as 
follows: 

1. Choose any system of variables z a+ 1 , z,+ 2 , • • • , z m such that their values 
determine uniquely the position of the point E' on any fixed hypersurface W'(<p). 
These z’s considered as functions of the y’s should be continuously differentiable 
twice. 

2. Find the relative probability law of the z’s given the <p’s. This must be 
done for every possible set of values of the <p’s. 

3. In the space of z,+i, z,+ 2 , • • • , z m consider regions which satisfy the equality 
(44) identically in the 0’s. Any such region could be taken to form a part of w\ 
the region similar to the sample space, which we are trying to construct. If 
the assumption that the differential equations (9) imply the existence of a suffi¬ 
cient system of statistics for 0i, 0 2 , • • • , 0. is true, then (see [2], p. 366) the 
probability law p(z,+i, z.+ 2 , • • • , z m | <pi , • • • , <p») will be independent of the 
0’s and there will be an infinity of regions satisfying (44). 

Obviously, instead of dealing directly with <p \, ^ , • • • , <p a as described above, 
we may select any system of statistics Ti , 7 2 , • • • , T a such that the system of 
equations 7\ = constant is equivalent to <pi = constant, for i = 1, 2, • • • , s. 

Returning to the particular problem of similar regions with respect to <r, 
6 > • • • > &, we notice that instead of the <p s we may consider 

(46) Ti = Is! and T i+1 = a*. for i - 1, 2, •.., N. 

*-1 

Now we wish to select a convenient system of variables, denoted by z a +/s in 
the theory above, to determine the position of the point E' on any hypersurfacc 
W(v) where all the functions (46) have fixed values. Obviously there is no 
unique choice and we shall use what we find convenient. But notice that the 
total number of these variables should be, in our case, Nn — N — 1. The 
following system may be suggested. 

If the sum 2S? has a fixed value then none of the S* can exceed T \. Write 



and consider u \, v* , • • • , u N -1 as belonging to the system of variables sought. 
The region of their variation is determined by the inequalities 

N-\ 

0 < Ui and 2 m<<1 

i-i 


(48) 
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If the u’s are fixed then they, together with the value of T \, determine the 
values of Si , Sz S N . As the values of a= Tt+i are already fixed, we 
have to solve the problem of choosing for each i = 1, 2, • • • , N a system of 
n — 2 variables, say z,,i, z,-, 2 , • • • , z,, n -s, which with x<. and S, will completely 
determine the values of x.,i, x, (2 , • • • , x,,„ . However, this will only have to be 
done if n > 2. Following the now familiar method (see, for example, [5], pp. 
33-43), we may determine the z<.,- hi two consecutive steps. First write 



Xi, 1 = Xi. 

+ 4/1^2 p<>1 + 4/ 2 - 3 P ’' 2 + • • * + 4/(„ - 1 )n Kn - 1 

( 49 ) 

Xi,i = Xi. 

4/ 1.2 Kl+ 4/ 2.S Vi * + ••• + 4/(» - Dn^'”- 1 

Xi* = Xi. 

2 y / 2 ^ 3 t ' < ’ 2 + "■ + 4 /( n ~ l)n Vi,n ~ 1 


H 

II 

e 

<? 

<» (»_!)»*“■— 

where 

Vi. 1 , Vi ,2 , 

• • * , 1 are new variables satisfying the identity 

( 50 ) 


22 v*,i = 22 (*.-.* - *<•)*. 

y-i j-i 


We transform them further by putting 


Vi,i = \/n Si cos z,-,„_2 cos z,-,„_a • • • cos z,-,j cos Zi.i 
Vi, 2 = \/nSi COS Z,-.„_2 COS Zi,n~ 3 • • • cos Z,-,2 sin Z,-,1 
( 51 ) Via — \/nSi cos z,-,„_2 cos z,-,„_a • • • sin Zi , 2 


= y/nSi sin Zj ,„_2 
with the z’s varying as follows 


(52) 


0 < z»,i < 2x 

— ir/2 < z»,y < t/2 


for j = 2, 3, 


n — 2. 


Of course, instead of the <S< we should put their expressions in terms of and 
the u’s into (51). With the exception of a set of measure zero, which can be 
ignored, the formulae above determine a one-to-one transformation of the 
original space W of the x’s into the space W' of Ti, , • • • , T N+i , u t , • • ■ , 

Uy-i , and Zi.i, z iA , • • • , z,>_2 for i = 1, 2, • • • , N. 

In calculating the joint probability law of all the new variables, we notice 
that, on the hypothesis tested, all the Nn original variables are mutually inde¬ 
pendent. Consequently, the transformations (49) and (51), which refer to 
separate groups of the *<,/s, corresponding to fixed values of i, could be carried 
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through separately. In doing so, we use formulae deduced elsewhere (see [5], 
pp. 38-39) directly and obtain 


(53) p(xi., S{, Zi,i, • • •, Zi, b_j) = 


^ e -*n(4+<-4.-*!)»)/.» JIcos^^. 

I-* 


It follows that 


p(*i., • • • , Xn- , Si, • • • , Str, Zi, i, • • • , Ztr.n-i) 

N 

“ IT P(*i« y &iy 2 *.1> * * * } 

( /—■ v N N N h i 2 

) e-*-n w-v*-*'' 1 n n co*- 1 

a V 27T/ *-i *-i 1-2 

We now wish to introduce !Ti and the instead of the Si 8. Since all other 
variables remain unchanged the Jacobian of this transformation reduces to that 
of (47). Simple calculations show that 

<«> *”''' ■ S '\ | = <j:>' ir- (i - "£ uX 1 ff 

I d( 11, Mi, • • • , Mat— l) I \ »-l / «-l 

Using this expression and substituting (47) in (54) we finally obtain 
P(*l. , * * • , , Tl, Ul, • • ' , Un-1, *1,1, • • • , *at,«-s) 

— (Vi N ( V” V" (*<.-{«)*/.• r |j»(n-l)-l 

(se) - (,) IvrJ e * Ti 

// N ~ l \ S ~ 1 \i(n~8) y n—2 

•« * nri/, ‘ ((i -E«.)n«() n n cos* -1 **,,. 

\\ *-l / »-l / fc-1 7-2 

To obtain the relative probability law of U \, , • • • , , Zi.i, • • • , ZAr , n -2 

given Ti and the Ti+i = x t . , we must calculate p{T \, T 2 , • •• , TV+i) and 
divide expression (56) by it. Of course, p(7\ , T 2 , • • • , 7V+i) is obtained from 
(56) by integrating over the whole of W'(^), that is, for all other variables be¬ 
tween the extreme limits of their variation. As these limits are independent of 
the values of T x , T 2 , • • • , 7V+i, the result will be 

(57) p(T,, T 2l , r.+i) = ce"‘"$» (I <•-«< )V ' ! e -1 " 1 ' 1 "* 

where c denotes a constant. Thus 

p(Ul , • • • , , Zl,l, • • • , 2 | Ti, • • • | 7V+i) 

// ^-1 \ tf-1 \|(n—8) n—1 

ci((i-zWt)n«() nncos* -1 **., 

\\ »-l / t-1 / *-l 7-2 


(68) 
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with the region of variation W'(<p) limited by the following inequalities 

0 < Ui, E « < 1 

(59) 0 < <2T for Jfc « 1, 2, • • • , N, 

—v/2 < Zk, 1 < tt/2 j = 2, 3, • • • , n — 2. 

Since (58) integrated over W'(^) is identically unity, Ci is a purely numerical 
constant. 

Now to construct any region w similar to the sample space with respect to <r, 
ti > • * * ) £* , we must select, separately for each and all systems (<p) of values 
of Ti , Ti , • • • , 7V+i, a region w'(<p), part of W'(<p) as defined by (59), with the 
sole restriction that 


( 60 ) 


/ ••• / p(ui, 


, U N - 1 , Z 1 , 1 , 


Zn,ti-2 | Ti, 


•. , TV+i) 

•duly • • • , dZff t n—2 =* 


Obviously, there is an infinity of ways of selectmg any single one of such 
regions. For example, we could let the ?/’s vary as indicated in (59) and limit 
the z’s by 


(61) 0 < z k ,i < a, — a < z ktj < a (k = 1, 2, • • • , N; j == 2, 3, • • • , n — 2) 

where a is chosen so that (60) is satisfied. This choice of w'fa) may correspond 
to one particular system of values of , Tt , • • • , 7V+i and no other. Again, 
the same region (61) may be chosen to serve for all systems of values of the T’n. 
In this case, the region w = ^2 w ( { p) might be described as cylindrical. Any 

such region w will control errors of the first kind in testing H to the same level 
of significance e and, as far as these errors alone are concerned, each of these 
regions is of equal value. Whatever the choice of regions w'{<p) or w{ip) } the 
test of H will consist of (1) observing the values of the s, (2) calculating the 
corresponding value of Ti , T %, • • • , 7V+i, the u’ s, and the z’s, and (3) noting 
whether the point with coordinates U\ , ui , • • * , u N -1 , Zi.i, • • • , z N , n - 2 falls in 
the region w(<p) chosen to correspond to the observed values of 7\ , 7\ , • • • , 
7V+i. Of course, in practical cases, the choice of w'(*p) for one system of values 
of the 7 T, s will not be quite unconnected with that for others. On the contrary, 
there will probably be some more or less simple rule connecting w'iv) with the 
corresponding systems of the 7”s. As a result, the actual machinery of the test 
will be much simpler than that described above and will consist of the calcula¬ 
tion of only a very few functions of the x’s and in checking some simple in¬ 
equalities. 

Now our purpose is to select a region from the infinite family F(c) of all 
regions similar to the sample space with respect to a, fa , ... , £* which we judge 
most satisfactory for controlling errQrs of the second kind. Roughly speaking, 
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this region will have to be such that, if the hypothesis H is not true, the observed 
point E will fall in this particular region as frequently as possible, in general. 
Here we come to the necessity of specifying the ways in which we expect the 
hypothesis H to be untrue. It may be untrue in an infinite number of ways. 
For example, the values of the a’a may (1) be equally distributed over any given 
range, (2) may fall into just two groups a = 1 and o-, = 2, or (3) all <r/s except 
the last may have the same value <r while the last is 10<r, and so forth. Any 
such assumption will be called an hypothesis alternative to H. It is obvious 
that the probability of E falling in any given region w will be different for each 
of them. Therefore, if we wish to deduce a test which will detect the falsehood 
of the hypothesis tested frequently, we must analyse the practical cases where 
the test is to be applied and guess the ways in which the hypothesis tested is 
usually wrong. Then we can deduce a test which will be, in one sense or 
another, most sensitive to the assumed deviations from the hypothesis tested. 
Needless to say, our guess may be right or wrong. In the latter case, an in¬ 
creased volume of observational material may demonstrate its fallacy and sug¬ 
gest the necessary modifications. In any case, it is important to know exactly 
the class of alternatives for which our test is, in some particular way, the best. 

5. The set of hypotheses alternative to H. Let us consider the routine analy¬ 
ses made at some laboratory and try to discover the circumstances likely to 
cause variation in their accuracy. First of all, we may think of assignable 
causes such as a change in personnel, apparatus, or accommodation. These 
and similar causes are likely to produce lasting effects; the test of the hypothesis 
that they did not reduces to one of the equality of only two <r’&. An easy 
application of known theory [20] shows that the familiar F or z test is unbiased 
of type Bi , which means that it is preferable to any other. Consequently, 
situations of this kind and also similar one for which the L x test is applicable [9], 
need not be considered here, so that we may concentrate on cases where there is 
no directly assignable cause of variation in the accuracy of the analyses. As¬ 
sume then that the personnel, the apparatus, the accommodation, etc., remain 
the same. Now the accuracy of analyses depends on a multitude of causes 
evading identification, such as changes in the efficiency of the workers. In 
principle, they try to have the highest, and therefore a constant, level of accuracy. 
Uncontrollable circumstances cause some fluctuations about a certain average 
and we expect that small deviations from this average will occur more frequently 
than large ones. With this in mind, the author feels that it would be appro¬ 
priate to expect that variations in accuracy, if any, will have a random character 
so that any <r,- referring to one particular group of analyses, or any monotonic 
function of that a could be considered as an essentially positive random variable, 
having some unimodal probability law. To make the problem of the best test 
sufficiently specific, we must specify this law entirely. Here we face a some¬ 
what embarassing freedom of choice. For lack of more precise information as 
to the random variability of , we guide ourselves by considerations of ease in 
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calculations. From this point of view it is convenient to consider the variable 
(62) h = <r~ 2 


and assume that, within a given period of time which is not too long, when the 
conditions in a laboratory are sensibly constant, it is varying according to the law 

(63) p(h) - d“A a_1 e"'“/r(a) for 0 < h, 

where a and P are unknown non-negative constants. It is useful to express these 
constants in terms of two new ones which have an obvious interpretation: ho , the 
expectation of h , and v, the square of the coefficient of variation of h . Easy 
calculations give 


(64) a = l/vj P = 1/hov . 

Now p(h) has the form 


(65) 


p(h) = 


1 

(Aor)»r(l/r) 


fodM-l e -H/h 0 p 


We note that when v — ► 0 the probability law (65) tends to a limiting dis¬ 
continuous form with P{h = ho} = 1. This corresponds to the hypothesis H 
that we wish to test. The type of law represented by (65) is known to be 
rather flexible. Consequently, we may easily assume that even though the true 
variability of h (or a) does not exactly correspond to (65), there will be a system 
of values of ho and v for which the difference between the true law and (65) will 
not be large. Therefore, a test which is particularly sensitive to deviations of v 
from zero with law (65) will be reasonably sensitive in real practical cases. 
However, this is an assumption by the author. But it is subject to test and this 
will be done below. 

Formula (63) represents the hypothetical probability law of the variable h 
which is not directly observable. We must use this formula to obtain the 
probability law of the observable x f s alternative to (6), which corresponds to the 
hypothesis H being true. Using h = l/o- 2 , we write the relative probability law 
of x it i , Xi , 2 , • • • , Xi, n given h 

(66) p(*w, ••• , *,.»| h) = (I)"' 2 


Multiplying (66) by (65) we obtain the joint probability law of h and the Xi,/& 
referring to one group of analyses 


(67) p(h) %i,ly * * • f %i,n) — 


1 


(2ir) n/2 (Ao v) 1,¥ r(l/>0 




Integrating (67) with respect to h from zero to infinity, we obtain the absolute 
probability law of x,-,i, x,- i2 , • • • , x<.„ , all referring to the ith group of analyses. 
Assuming that the value of h in one group of analyses is independent of that in 
another, we obtain the joint probability law of all the Nn observable z<./s by 
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simply multiplying the probability laws referring to particular groups of » of 
them. The result will depend on IV + 2 unknown parameters, ( 1 , 

) ho, and v. As the last two will play a more important role than the others 
we shall denote the probability law by p(E | ho, v). Easy calculations give 


(68) p(E\ho,v) 


( r(n/2 + l/p)Y 
V(2ir)"/*r(lA)/ £ 


(M‘" B 


S 0+ t £ - *>') 


n/2+1/p * 


We easily check that for v —► 0 (68) approaches the law (6) with ho = < 7 ~ 2 . 
Therefore, the problem that we shall treat below will be to assume that the 
observable follow (68) with some ho > 0 and some v > 0 and to test the 
hypothesis H that v = 0. More specifically, we shall try to choose among all 
the regions of the family F(«), found in the preceding section the one over which 
the integral of the function (68) is, in general, the largest. 

Before doing so, it may be useful to exhibit some experimental evidence in 
favor of the assumption that, if a is not constant in some conditions of analysis 
or measurement, then it varies in such a way that the variability of the x’b has 
at least some characteristics appropriate to (68). 

Introduce the notation 


(69) 


03 % — 7 lS% — ^ (,X{,j X%.) • 
i 


Using transformations (49), (50), and (69), successively, we easily deduce the 
probability law of o>% 


(70) 


, . _ (/*W2)‘ < "~ 1) r(*(n - 1) + 1 /v) 
p r(i(n - I))r(i/V) (1 + 


If the hypothesis we have made about the variability of h y as expressed by (65), 
is true in any particular case then the sums of squares (69), referring to each 
particular group of analyses, are distributed according to (70). The reverse is 
not necessarily true, of course, but it is comforting that a check of the above 
in a number of broadly divergent circumstances gives satisfactory results. By 
applying the transformation 1 + Aowo,/2 = T 1 , the integral of (70) is easily 
reduced to an Incomplete Beta function whence Pearson’s tables [24] provide 
an easy means of calculating the theoretical probability that w,- is within any 
given limits. 

Table I gives several observed distributions of the sums o) together with their 
expected ones, calculated from (70) with the values of ho and v fitted by the 
method of moments. The last lines give particulars of the application of the 
X test for goodness of fit. 

The origin of the data used to compile Table I is as follows: 

For the data providing frequency distributions numbered 1 and 2, the author 
is deeply indebted to Professor Raymond T. Birge. The methods of measure¬ 
ment and their purpose are explained in the publications [25] and [26], respec- 
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Comparison of empirical distributions of w with those calculated from (70) 


Number 

1 

2 

3 

4 

5 

Author or 





K. Buszczyri- 

A. A. Michel- 
son, F. G. 
Pease, and 

F. Pearson 



Source of 
Data 

R. T. Birge 

R. T. Birge 

ski and Sons, 
Ltd. 

W. S. Svenson 

Kind of Mea¬ 
surement or 
Analysis 

Strong Lines 
in the Band 
Spectra of 
Nitrogen 

A Solar 
Spectrum 
Line 

Sugar Content 
of Beets 

Velocity of 
Light 

Octane 

Rating 


Frequency 

Frequency 

Frequency 

Frequency 

Frequency 

o) 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

0-1 

29-38 

29 

15-10 

17 

15-56 

16 

3-50 

2 

14-90 

17 

1-2 

19-30 

20 

13 14 

11 

12-67 

17 

7-73 

10 

18-88 

16 

2-3 

13-11 

17 

11-39 

15 

10-70 

13 

9-37 

13 

16-83 

14 

3-4 

9-16 

7 

9-84 

5 

8-98 

2 

9-66 

8 

13-93 

12 

4-5 

6-56 

6 

8-46 

9 

7-53 

11 

9-28 

17 

11-20 

10 

5-6 

4-80 

i 

7-24 

9 

6-34 

4 

8-60 

7 

8-91 

7 

6-7 

3-59 

4 

6-17 

ii 

5-36 

3 

7-80 

7 

7-04 

10 

7-8 

1 4-80 

i 

5-23 

4 

4-54 

7 

6-99 

7 

5-58 

9 

8-9 

3 

4-40 

2 

3-86 

4 

6-22 

4 

4-43 

7 

9-10 

] 

2 

3-69 

2 

J 6-09 
• 4-45 

0 

5-52 

4 

3-52 

7 

10-11 

11-12 

3-94 

0 

0 

} 5-63 

2 

1 

5 

0 

4-88 

4-32 

3 

5 

} 5-08 

3 

1 

12-13 

] 

4 

1 3-76 

3 

0 

3-82 

3 

] 

0 

13-14 

[ 5-36 

1 

1 


5 

- 6-37 

2 

4-51 

1 

14-15 


0 


1 

4-61 

1 

5 

1 

0 

15-16 


0 

\ 5-95 

3 


0 

■ 5-03 

1 

] 

1 

16-17 


1 


1 


0 

0 

\ 6-18 

1 

17-18 


0 


1 


3 

4-00 

3 


1 

18-19 


0 


1 

4-37 

1 

2 


2 

19-20 


1 


1 


0 


2 


0 

20-21 


1 




1 

4-55 

0 


0 

21-22 


0 




0 


1 


0 

22-23 


0 



1 4-94 

| 

1 


2 


0 

23-24 


0 



0 

4-23 

i 


0 

24-25 


0 



j 

0 

i 


0 

25-26 


0 




1 


i 


0 

26-32 


2 




4 

3-94 

3 


1 

32-43 






1 

3-58 

3 


1 

>43 





| 


3-61 

6 



Total 

o 

o 

8 
i—< 

100 

100-00 

100 

100 00 

100 

123 00 

123 

120-99 

121 

X 2 

9-63 


12-67 


18-75 


18-09 


13-35 


Degrees of 











Freedom 

7 


10 


11 


18 


10 


W) 

• 21 


-24 


■ 066 

• 45 


-21 



The symbols ) are used to indicate the groupings used in the calculation of 
the x 2 * The groupings were made so as to have the expected frequency in a 
class at least equal to 3.5. 
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tively. These papers also contain various compilations of the results of the 
measurements. However, the original single measurements, necessary for the 
present paper, are naturally unpublished and Professor Birge was kind enough 
to find them for the author in his records. 

Frequency distribution No. 3 was compiled from a book of records of sugar 
beet trials carried out by Messrs. K. Buszczyfiski and Sons, Ltd. in G6rka 
Narodowa, Poland. 

The 4th distribution was constructed from the original measurements of the 
velocity of light as published [27] by Michelson, Pease, and Pearson. The 
measurements made during single days were treated as forming separate groups. 

Distribution No. 5 originated from repeated measurements of Octane Rating 
conducted by a refining company in California. They were made accessible by 
Mr. Walter S. Svenson and it is a pleasure to express the author’s deep grati¬ 
tude to him. 

The number of observations in each column is not very large. It may be 
expected that if it were increased, the differences between the hypothetical 
distributions and the observed ones would become more apparent. It seems 
safe, however, to assume that in a number of instances the hypothesis as to the 
character of the variability of w* is not in very bad disagreement with the actual 
facts. It would be most interesting to have some more data on the subject. 

(5. The best critical region for testing H against a particular alternative. It 

seems unquestionable that the most desirable test of any hypothesis is the uni¬ 
formly most powerful test (U. M. P. Test) with respect to the whole class of 
simple hypotheses alternative to the one which is being tested. Denote by H 
the hypothesis tested, by h any simple admissible hypothesis alternative to H , 
and by 12 the set of all K s. If w 0 is the critical region corresponding to the 
U. M. P. Test, then ii>o has these properties: 

(71) (1) P\Eew 0 \H) m 6. 

(2) If w is any other region such that P[E t w | H } = e then 

(72) P{Eew 0 \h} > P{Eew\h }, 
whatever be h e 12. 

Following the known method [18], we shall see whether a test of the hypothesis 
H considered in the preceding sections exists which is a U. M. P. Test with re¬ 
spect to the whole class of admissible hypotheses that specify the probability 
laws (68) with any ho > 0 and v > 0. 

The method consists of considering one particular alternative hypothesis A', 
that is, one particular set of values of ho > 0 and v > 0 and finding the best 
critical region for testing H against h '. If this region appears to depend 
on v and/or on ho then there is no U. M. P. Test. The region w hQt , is found by 
determining, for each system (<p) of Ti , 7 T 2 , • • • , 7V+i separately, a part w ho , P (<p) 
determined by the inequality 

(73) p(E | ho , v) > k(ip)p(E | H) 
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where k(<p) is a function of T \, T*, • • • , T N +\ so determined that the relation 
(60) is satisfied. Substituting (6) and (68) in (73), taking the logarithm of both 
sides, and combining all terms which are constant or depend only on T \, T 2 , • • • , 
T n+ i , we have 

(74) £ log (1 + hhoMSl + (Ti+i - *,) 2 )) < hiTr , • • •, T n+1 ), (say). 

»-l 

Clearly, for 7\, T 2 , • • • , TV+i fixed, this inequality imposes a restriction on the 
variability of u x , it* , ... , while zi,i, • • * , z *, n -2 are allowed to vary indis¬ 
criminately within the extreme limits (52). But the region w^,vM determined 
by (74) also depends on the product hov. Therefore, there is no uniformly most 
powerful test for testing H against any and all simple alternatives specifying (68). 

7. A critical region of an unbiased type. There seems to be no grounds for 
dissention that when a U. M. P. Test exists and is readily applicable, it is pref¬ 
erable to any other test, but the situation is quite different when there is no 
U. M. P. Test. In such cases, practical considerations may suggest a variety 
of requirements for a second best test of the hypothesis. Among these, we may 
suggest the following considerations: 

Fix, for a moment, the values of Ao, £i ,•••,£* , take any region w of the 
family F(e), and consider the probability of E falling in w as a function of v 
only. This is called the power function 

(75) p(v\ w) = p(E | ho , v)dx i,i • • • dx N , n 

Here, of course, v > 0. Because of the properties of regions belonging to F(e) 
we have 0(0 | w) as e. If v > 0, the value of 0(i/1 w) represents the corre¬ 
sponding probability of the test (based on w) discovering the falsehood of H. 
It is obviously desirable to have this probability as large as possible. In any 
case, it should be greater than t. This last restriction is known as that of un¬ 
biasedness [19], [20], [28]. Further, since it is impossible to maximize 0(v | w) 
for aU values of v , we must choose those for which it is most desirable, in our 
opinion, to concentrate our efforts to increase | w). One possible point of 
view is that these values should be very close to the hypothetical value v = 0. 
For if v is considerably larger than zero, we may argue that there will be no 
need to apply any refined statistical test to detect the falsehood of H . Of 
course, this argument has no mathematical character and its general acceptance 
is not suggested. In fact, we may argue that if v is greater than zero but very 
small, it will be almost impossible to detect the falsehood of H by any test and, 
therefore, our efforts should be concentrated on values of v which are of con¬ 
siderable size. 

These are considerations of non-mathematical character; the role of mathe¬ 
matical statistics is limited to devising tests and elucidating their properties. 
If these last are understood by practical statisticians, each may choose according 
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to his problem. Note that what could be termed the “properties” of a test are 
summarized in the power function fi(v | w) with its relation to the power func¬ 
tions of other possible tests of the same hypothesis. 

In this paper we shall deal with tests particularly sensitive to small deviations 
of v from its hypothetical value v = 0. In this respect, our first trial is to find 
a region Wo , belonging to the family F(e) and satisfying the condition 


(76) 


dp(v 1 Wp) 1 > d0(v 1 w) 
dv JpmO “ dv Jr-0' 


where w is any other region belonging to the same family F(€). 

Because of the peculiar structure of the regions belonging to F(c), the problem 
is immediately reduced to finding regions w 0 (<p). According to theory explained 
elsewhere [18] these should satisfy the condition 


(77) 


ap(g i fto, y) ~l > | H)i 

OV Jr-4 


where k(T) depends on 1\ , T t , • • ■ , T N+l only and is determined to satisfy the 
condition of similarity (60). Condition (77) is equivalent to 


(78) 


d log p(E | ftp, y) 


dv 


> HT). 


Taking the logaritlun of (68), differentiating with respect to v, putting v equal 
to zero, substituting in (78), and combining all the terms which are constant 
on W{<p) into a single term which we may write as \hlki(T), we have 


(79) 


Z (Si + (T <+ X - £i) 2 ) 2 > h(T). 


We note that condition (79) determining, so to speak, the shape of the region 
Wo(<p) does not imply any restriction on the variability of the z ’s but only on 
the m’s. However, the region w Q (<p) as determmed by (79) has the disadvantage 
of being dependent on the values of the . Since these are not specified by 
the hypothesis tested, we are not able to determine the critical regions belonging 
to the family F(e) and maximizing the derivative d/8(v | w)/dv] ¥ ^ . The region 
which does so for some particular system , & ,•••,(# of values of the £'s 
will lose this property if the system of values of the ( } s is appropriately changed. 
Therefore, our choice of the region maximizing the derivative of the power func¬ 
tion at v = 0 should be made not from the whole family F(e) but from a sub¬ 
family Fi(«) composed only of such regions which also possess the supplementary 
property that 


(80) 


d&(v | w) 


_ *-o 


constant 


has a value independent of &, The determination of this sub¬ 

family Fi(t) embracing all such regions is an interesting problem. Until it is 



68 


J. NEYMAN 


solved, we use an obvious subfamily F 2 (t) of regions w which have the desired 
property, but we do not know whether or not F 2 (i) contains all such regions. 2 

The family F 2 (e) is defined as consisting of those regions belonging to F («) 
which could be described as cylindrical with their generators parallel to the 
intersection of 7\+i = x x . = constant, for i = 1, 2, • • • , N. In other words 
and more precisely, a region w of the family F(*) belongs to F 2 {t) only if the 
question of its including a given point E depends on Nn — N of its coordinates, 
namely on T x , u x , ... , u x -\, Zi.i, • • • , z s , n ~2 and not on T 2 , r l\ , • • • , 7V+ 1 . 

We easily show that any region w belonging to F 2 (e) possesses the property 
that its power function is independent of the {/s. Denote by w' the set of sys¬ 
tems of values of T x , ui , • • • , u N -i , Zi,i , • • • , z Ntn - 2 corresponding to points 
included in any given region w of the family F 2 (e). We see that the power 
function P(v \ w ), equal to the integral of (68) over w, can be calculated by using 
the transformations (47), (49), and (51). Then the region of integration for 
T\ , Ui , • ■ • , Uff-\ , Zi,i , • * • , Zjv, n —2 is what we have just denoted by w' and the 
integrations for 1 = x x . extend from — 00 to + 00 irrespective of the fixed 
values of the other variables. These integrations are easily carried out by sub¬ 
stituting 

(81) \nhov(Xi. — £i) 2 = (1 + $nhi)vS*)t 2 i . 

The final result is 

(82) | w) = /•••/.. p(T , , Ui , • • • Wat-i , Z\,\ , • • • Za', w - 2 ) dT\ • • • (lz N , n - 2 

Here 

p(T\ , Wi , • • • Un~ 1 , Zl.l , * • * ZA r ,n— 2 ) 

(831 / 

= cW$(r„ z) /II (1 + \nKvSW n ~ mi ", 

where c{v) denotes a constant depending on v, <f>(7\, w, z) denotes a function of 
all the N(n — 1) variables involved, independent of v } and S* denotes expressions 
(47) for short. We see that (82) is independent of the £/s. 

Since the region w belongs to F(e), it is composed of sections w(<p) selected 
separate^ on each hypersurface = constant and Ti+\ = constant, i = 1, 
2, • • • , N. Because of the definition of the family F 2 (e), the sections w(<p) are 
independent of T 2 , T z , • • • , 7Vu so that each of them can be selected only in 
accordance with the value of r J\ . Therefore, we may denote them by w( r l\). 
As far as property (80) is concerned, the choice is arbitrary. But the property 
of similarity requires the fulfillment of condition (60) which, in the present case, 
reduces to 


(84) / ••• / p(ui, ••• Wa-1,21,1 , * * * Zat, w -2 I F\ , • • • 7V+i) dti] 

J J w(T j) 


dZ: 


N ,n—2 


* Regions with the property (80) and belonging to F(t) but not to F 2 (t) exist. Probably 
however, each of them differs from one of the regions of F 2 (e) by a set of measure zero only. 
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Applying the method already used, we find that sections©(T^) of the region© 
belonging to F 2 (e) and maximizing the derivative d0(v | w)/dv] 9m o are determined, 
separately for each value of T \, by the inequality 

( 85 ) a log p(fi , Uj , • - - Wat- 1 , Zl,l , Zn ,n— l) 



where fa(T x ) denotes a function of T x determined to satisfy (84). 
Substituting (83) in (85) we easily find that this condition is equivalent to 


( 86 ) 



S-l \2 

HuA > k»(Ti) 

<-l / 


where, again, kz(Ti) is determined for each particular value of T x to satisfy (84). 
As (86) does not imply any restrictions on the variability of z\,i , Zi, 2 , • • • , , 

the integrations for the z’ s while calculating (84) must be carried out over the 
extreme limits (52). This will reduce the integrand to the relative probability 
law of Ui , U 2 , • • • , Wjv- i given all the T 9 s. This law is easily calculated from 
(58) and is 


(87) 


p(u\ , W 2 , • • • Wat-i | T\ , T 2 , • • • Th+i) 

= p(ui , W 2 , • • • Wat— i) 


i(n-8) 


As (87) is independent of T\ , 1\ , •. • , TVn , it is also the absolute probability 
law of the u ’s and hence & 3 (7\) is independent of . In accordance with the 
notation adopted for the left side of (86), namely f, and since the choice of 
kz(Ti) depends on €, n, and N , we may use f, instead of fc s (7\). Then the region 
w is determined by the inequality 

AT-l / JV-1 \2 

(88) r = Z «! + (1 - g > f. 

or, returning to the original variables, by the inequality 

(89) f ■gsi/(g5!) , >r. 


where is the root of the equation 


(90) 


r(jAT(n - D) 
r*(*(« - l)) 


/•••/((>- 1 ' 

rar. 


Ui) II Ui) dui ■■■ dun-\ 


e 


This region w has the following property: of all the regions belonging to the 
family F t (e), the derivative of the power function of w at the point v = 0 is the 
greatest. Thus, as far as the values of v close to zero are concerned, we may 
say that, for testing H, w is the most powerful critical region in the family F s (e). 
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8. Methods of determining f,. To calculate f, accurately we must calculate 
the integral probability law of f, that is to say, 

(91) P{r < z} - /-/ p(uj, • • • Uw—i) dtti • • • dun—i 

r<« 

for any 2 . The author was not able to achieve this. Therefore some methods 
of approximation had to be looked for. This task becomes somewhat simplified 
by noting that in most practical problems N will be very large, in the hundreds 
or thousands, while n will probably not exceed 5. 

To start, we notice that the range of f is limited by 

(92) 1/N < f < 1. 

The easiest way to see this is to look for maxima and minima of the sum 

(93) X = E Si 

t-1 

subject to the restriction that 

(94) E S] = Ti 

t-1 

We then easily find that 

(95) T\/N < X < T\ 


and (92) follows directly. 

Since f is a polynomial of the second order in the w’s, we may consider its 

N 

moments. These will be functions of the expectations of the products JJ u\l* 

t-i 

AT— 1 

where, for short, u N = 1 — X) u i • Using (87) we easily find that 


(M) e (n «.*■) - - »» ft 

V - ' r(w»-D + gfc)‘- r(J(, ‘- 1)) 

In particular, if we let (n — l)/2 = a 


E{u\) = 


a(a + 1) 
Na(Na + 1) 


P /„4\ _ a ( a + l)(o + 2)(o + 3) 

K ° Na(Na + 1 )(Na + 2)(Na + 3) 


E(v*u*) = 


fl s (o -I - 1)* 

Na(Na + l)(Na + 2) (Mi + 3)' 


(99) 
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Consequently and because £ = ^ «<, we have 
(100) E(> T) - ni - (o + l)/(ATo + 1) 


( 101 ) 


£(£*) = Mi * Z S(ttJ) + 2 E E tf(«juj) 

<-l j—<4-1 


i-1 


(o + l)(o + 2) (a + 3) 


(AT - l)a(a + l) s 


(Na + 1 )(ATa + 2)(Na + 3) (Na + l)(Na + 2)(ATa + 3)' 
The variance <r* of £ is therefore 

* - 2a(g + 1 )(N - 1) 

U ' (.Na + l)*(JVa + 2)(JVo + 3) ’ 

By a similar procedure we find that 

(o + l)(o + 2)(a + 3)(a + 4 )(a + 5) 

+ 3 (AT - l)a(o + l) 2 (a + 2) (a + 3) 

+ (AT - 1)(N - 2)a\a + 1)’ 


(103) E(f) = 


(Na + 1 )(Na + 2)(Na + 3 )(Na + 4)(JVo + 5) 
II (a + i) + 4(AT — l)a(o + 1) II (a + j) 


i-i 


(104) £(£ 4 ) =mJ = 


i-i 

+ 3(Af-Do n («+;■)* 

+ 6(AT - 1)(AT - 2)o s (a + l)’(a + 2) (a + 3) 
+ (AT - 1)(AT - 2)(iV - 3)a , (a + l) 4 

II (Na + j) 

j-i 


One possible method of approximating f, is to use the formulae above, together 
with the higher moments whose formulae are easy to deduce. Some convenient 
known distribution, say po(t), could be fitted to have its first two or three mo¬ 
ments coincide with those of the unknown true distribution of f. We would 
then look for better approximations by means of the functions 


m 

(105) P»(f) = Po(f) Z -A/* - / 

7-1 

where the x/s denote polynomials which are orthogonal and normal with respect 
to po(£) so that 

r (1 if j = k 

(106) / x,x*po(f)d£ i . , 

J (0 if j k. 

The constant coefficients A ,• are formed to minimize the integral 

(107) / (p(r) - po(r) Z pfr'cr) dr- 

They are expressible in terms of the known moments of p(f). 
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This is one possible way to approximate p(f) which would eventually lead to 
the computation of £« even for small values of N. 

Remembering that we are concerned with large N 9 s, we can prove that the 
normalized distribution of f, that is, the distribution of 

(108) 

tends to be normal as N —► *>. However, the process of tending to the limit 
is rather slow as may be seen from the following table of K. Pearson’s Pi and p 2 . 


TABLE II 

Frequency constants of the distribution of f 


n 

N 

Ml 

a t 

01 

A 

3 

100 

.0198 

.001922 

.8652 

5.042 

3 

200 

.0099 

.000693 

.4618 

4.244 

3 

400 

.0050 

.000248 

.2410 

3.587 


Because of this and also because the proof that the distribution of (108) tends 
to normality is not very straightforward, we shall not reproduce it. But it may 
be well to point out that the cause of this slowness in tending to the limit lies 
in the skewness of the distribution of each particular u { and in the mutual 
dependency of all the w*’s. 

The most promising method seems to be the following. First consider the 
two sums 

(109) 7\ = 2 Si and T 0 = f) S*. 

»-1 »-l 

Obviously, these two sums satisfy the conditions of the limiting theorem of 
S. Bernstein [29], [30] and, therefore, as N —► oo, their joint normalized distri¬ 
bution tends to a normal surface. Also, we may expect the process of tending 
to the limit to be rapid in this case. If p(T 0 , 1\) denotes the limiting normal 
distribution, the probability that f > z can be approximately calculated by the 
integral 

(110) P{r > z\ = P{T, > zT\\ = f + " dTx r p(To, WTo. 

J—eo JbT] 

To calculate the limiting distribution p(T 0 , 7\) we need only the expectations, 
say A and fi, of Ti and To respectively, their standard errors, say oi and <r 2 , 
and their correlation coefficient R . These may be obtained from the moments 
of the SV s. 

Formula (110) can be used not only for tabulating the integral probability 
law of f and for determining , but also for an approximate calculation of the 
power function of the test. For, if the limiting probability law p(T Q , Ti) is 
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calculated using the moments of Sj calculated from (70) with some v > 0, then 
the integral (110) calculated with z = gives us the probability P{f > f, | v} 
of the test detecting the falsehood of the hypothesis tested, that is, the power 
function. 

To save space, we shall now calculate the constants A, B, <rj, a , and R as 
functions of v > 0. The values appropriate to the case when the hypothesis 
tested is true will then be obtained from the general formulae by the mere 
substitution of v = 0. 

Since all the constants above depend on the expectations of S* k , we use formula 
(70) to calculate them. Denoting the expectation of <S** by m* , we have 


( 111 ) 


2(nW2) ,( "~ 1) [“ 

B(l/v, h(n - 1)) Jo (1 + Jn/ioP(S s ) 1<B-1>+1/ ' 


Introducing the new variable 


( 112 ) 


i + huhovS 1 = r 1 


makes the integration straightforward and gives 

/ n a\ _ l 2 Vr(dA ) - fe)r(f(n - 1) + fe) 

) \nhoJ r(l/r)r(i(n - 1)) 


This formula holds good if 1/v > k. Otherwise the kih moment is divergent. 
So this approximate method of calculating the power function of the test is 
applicable only for v < .25. 

Substituting k = 1, 2, 3, 4 in (113), we have 


Mi = 


1 n - J. 
nho 1 — v 


(114) 


M2 


M8 


\nhoJ (1 - v)(l - 2v) 

, (} V _(» 2 - 1)(« + 3) 

\nho) (1 - 




(1 Y (n 2 - i; 
\nhoJ (1 - r)(l - 


r)(l - 2v)(\ - 3v) 

- l)(n + 3)(n + 5) 


2v)(l - 3r)(l - 4r)’ 


and now we have 


(115) 

(116) 


A = 


N n — 1 


B = 


JV 


n 2 - 1 


a 

ffi = 


nho 1 - v ’ “ (nAo)* (1 - v)(l - 2v) ’ 

N (n - 1)(2 + ?(n - 3)) 


s 

<r* 


(nho) 2 (1 - r)*(l - 2v) ’ 

2JV (n* - 1)(2 + v(n - 3))(2(n + 2) - r(5n + 7)) 


(nho)* 


(1 - i»)*(l - 2r)*(l - 3r)(l - 4 v) 


(117) 


I 
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Inspecting formulae (115) to (118) makes us see that there is an advantage 
in substituting two new variables 


(119) 


riho 


N(n - 1) 


Ti, 


(nho)* 
N(n* - 1) 


To, 


for Ti and To. Their expectations, say t>i and , are 


( 120 ) 



«»2 


1 

(1 - k )(1 - 2 k )’ 


Probably without any danger of confusion, the S.E.’s of li and t> may be de¬ 
noted by eri and <r 2 also and we shall have 


j_ 2 + v(n — 3) 

N(n - 1)(1 - k )*(1 - 2 k )’ 

s _ 2(2 + v(n - 3))(2(n + 2) - v(5n + 7)) 

N(n* - 1)(1 - 2k)*(1 - 3 k )(1 - 4k) ‘ 

Of course, the correlation coefficient of t\ and U is the same as that of T\ and T 0 , 
namely R. Obviously, the inequality To > zT\ is equivalent to U > Zit\ pro¬ 
vided that 


( 122 > 2 = Zl N(n — 1)' 

Now the problem of calculating (110) is reduced to finding 

P{f > *} - P{U > *i<!} 

1 ff 1 ft.-*)* 

(123) 2raia t \/\ - R* JJ, P L 2(1 - R») \ a\ 

- 2« ft -Wfc + a,. 

We may conveniently see the workings of the test proposed by considering for¬ 
mula (123). First consider the case when the hypothesis tested is true. Both 
and #2 reduce to unity. The region of highest frequency is around the point 
t\ = h = 1. If N is large then both <t\ and <r 2 are small so that the region of 
significant frequency is rather small. The integral (123) is to be taken over 
the region above the parabola U =* Zi*i passing through the origin of coordinates. 
When Zi is small and the parabola passes far below the point h = U = 1, the 
probability > z) will be close to unity. When z% = 1 this probability will 
be less than £ and it will diminish rapidly with further increases of Z \. Now 
suppose that we have found the value f, for which P{f>f € |v==0}=« and 
consider what will happen to (123) when z ~ f, if v is increased. Clearly, neither 
of ci and a nor R are very sensitive to slight changes in v. Also 0i will not 
change very much. On the other hand, #2 will increase rather fast. The final 
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conclusion is that the whole frequency surface corresponding to the integrand in 
(123) will not change shape much but will shift to bring a greater amount of 
frequency into the region of integration. 

To facilitate numerical calculations introduce 


(124) 


<i - ft 
n ’ 


tl — — Rfftifl ~ flO/q - ! 

— jK* 


Now (123) may be rewritten as 


(125) 
where 

(126) 


P|f £ ‘> ■ 


y(x, zi) 


zi(t>i 4- ffix) 1 — — Ratx 


Using formulae (125), (126) and (119) to (122), the following numerical 
values were obtained. 

TABLE III 

n = 3, N - 100, v = 0. 


Z, 


.8 

.9126 

.9 

.7305 

1.0 

.4905 

1.1 

.2847 

1.2 

.1495 

1.3 

.0730 

1.4 

.0335 

1.5 

.0148 

1.6 

.00644 

1.7 

.00288 

1.34450 

.05000 

1.54563 

.01000 


TABLE IV 

Power of the test for n = 3 and N = 100. 


€ 

r. 

» - .01 

V - .10 

.05 

.02689 

.05823 

.37482 

.01 

.03091 

.01234 

.10699 


The figures above are only approximate and we realize that the greater the 
value of v the less satisfactory is the approximation of the power function. A 
check of the goodness of the approximation and, if it proves satisfactory, a few 
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numerical tables for practical applications of the test must be postponed to 
another publication. 

It is a pleasure to record the author's indebtedness to Miss Elizabeth Scott 
and also to Miss Julia Bowman for carrying out all the numerical work con¬ 
nected with the present paper. 
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A CONCISE ANALYSIS OF CERTAIN ALGEBRAIC FORMS 

By Franklin E. Satterthwaite 

State University of Iowa, Iowa City, Iowa 

Many of the statistics in common use are functions of homogeneous algebraic 
forms in the items of the sample. Among such statistics are the mean, a linear 
form; the variance, a quadratic form; and the product moment, a bilinear form. 
With the extension of the science, the mathematical statistician is faced with 
the study of more complex statistics and the associated algebraic forms and 
matrices. The purpose of this paper is to set forth concise and efficient nota¬ 
tions and methods which may be used in such analysis. 

We shall borrow the essential features of our notation from differential geom¬ 
etry and tensor analysis. The Kroneker delta is defined as, 

= 1 , i = j> 

= 0, i 7 * j. 

The summation convention provides that summation be performed with respect 
to any index appearing twice in the same term. Thus, 

Xiy 1 = xty 1 + xj y 2 + • • • . 

To extend the use of the summation convention, we shall frequently place 
indices on the numeral, 1. Thus, 

Uxi = l 1 :^ + 1 2 x 2 + • • • = X\ + X2 + • • • . 

Symmetry in the calculations is more striking if the pair of summation indices 
appears, one as a superscript, the other as a subscript. Therefore we allow the 
shifting of an index from the one position to the other at will. Thus, 

Xi ss x\ 

Where no confusion w r ill arise, indices may be placed outside of parentheses. 



The standard notations for averages will be used. 
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Unless otherwise indicated, the symbol, 2, will always stand for summation 
over aU unrepeated indices including any already averaged under conventions 
(1) and (2). Thus, 

2£ s = Nx\ 


The following simple formulas are fundamental to the arithmetic of this 
notation. They are obvious upon the expansion of the summations. Each 
index varies from 1 to a. These formulas are 

S{xj = x it 

_ t* 

Oi Oj — o< f 

= i{, 

ljl* = all, 

= a, 



The symbols of this notation obey the associative, commutative, and the 
distributive laws of simple arithmetic so that the operations of summation, 
multiplication, and squaring are very easy. Thus for the product of two linear 
forms we have 



The sum of squares is obtained by the simple repetition of the form, 


(3) 


2*! = 2(5’ia;/) s - (6$ x,)(«'*/), 
= («jX,-)( 5 |U*) = ijfeSyX*. 


Two other sums of squares occur so frequently that they should be particularly 
noted: 


( 4 ) 
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(5) 


2 (*< — £)* = 2 


[RW 

= (ifi — - 8 — 5 - + - -) 

\ a a aa/ik 


The striking similarity in the coefficients of the second and final expressions for 
the summations in (3), (4), and (5) should not be overlooked. 

Where we have multiple classification of the variables, we may operate on 
each index separately. For example, in a four-way analysis of variance we may 
have the quadratic form, 


Q = 2 {&,•*. — £%j.. — £<.*. + £i... | s , 



The rank is one of the important properties of a quadratic form or matrix. 
An experienced mathematician usually has a rule of thumb for determining the 
ranks of those quadratic forms occurring in statistical analysis. In order to 
formulate such rules of thumb into a simple and rigorous algebra, the author 
here defines a type of matrix multiplication which he calls “uncontracted matrix 
multiplication” and which he represents by the symbol, O. 

Let A = || a< || and B = || 0} || be two matrices of any finite orders and with 
ranks Ra and Rb . We define the uncontracted product, A G B, as follows: 


C = AOB 

= Milo* 

= M*ll 

a\B a\B •• 
ot\B c\B • • 
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where 

a{ pi • • • 
a < J a\ pl a< pl • * * 

Thus the elements of C are 

7 

We therefore see that whenever we have a matrix whose elements can be 
factored in the above manner, then the matrix can be expressed as the uncon¬ 
tracted product of simple matrices. Thus, 

5f ••on 

then - ll«ri|O||0“||O •••. 

We shall now prove that the rank of the uncontracted product, C = A O B, 
of two matrices is equal to the product of the ranks. This follows because for 
the matrix, A, there always exists a set of elementary transformations defined 
by the equations, 

t a : A - Q Q K 07 aZ, , e\ * o, % - j, 

where the 0”s, i = j, are coefficients providing for the multiplication of the ele¬ 
ments of a row by a constant not zero; the 0<’s, i ^ j } are coefficients providing 
for the addition to the elements of a row a linear function of the corresponding 
elements of the other rows; the 0’s are similar coefficients referring to columns; 

the symbol 0^ is an operator indicating the interchange of the ith and jth 

rows (columns); and the a&’b have the values, 

Adi = 1, i = j < Ra, 

= 0, otherwise. 

This set of transformations reduces A to a diagonal matrix with R A non-zero 
elements. A similar set of transformations, 



exists for the matrix B . We next define two sets of transformations by the 
equations, 

T’y. us*>-(5)(2) *.»-<«: A 
T',: U .si) - (?)(£)«.*ru «3, 
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which are also elementary because of their relationship to Ta and T B . Now 
if we subject the matrix, C =* || (a|/3*) || to the transformations T' a followed by 
the transformations T' b , it will be reduced to the diagonal form C = || (a^bS *) || 
with exactly RaRb non-zero elements. Therefore, since the rank of a matrix is 
invariant under elementary transformations, the rank of C ■» A O B must 
be R a R b • 

We shall now determine the ranks of several matrices which occur frequently 
in statistics: 


A, - 111,11 = || 1 , 1 , 1 , ... ||, &- 1 . 

a. - min = uni'll - ii h iion i'ii, 

Rt = 1.1 = 1. 

A> = || S{ ||, R, = a. 

A ‘~ ( S -a)!l!' — 


The proof that R 4 = a — 1 involves two steps. First summing the rows of A 4 
we have, 



so that Ra < a — 1. Second if we subtract the elements of the first row from 
the corresponding elements of each of the other rows we obtain, 


A< 




I i = 1 

ii* * i. 


Since the (a — l)st order determinant in the lower right-hand comer is not 
equal to zero, R 4 > a — 1. 

Applying our theorem on uncontracted products, the ranks of complicated 
matrices can often be determined by inspection. Thus: 


hi4-d;ih‘ ! hih); 

Rt = a»(b — 1). 

(• - «)’(* - O'. 

Re = (a - 1)(6 - 1). 

41 - (• - OX* - % -1[(* -«)'■'■][(* - M 


R 7 = M = 1. 
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The Matrix At may be confusing at first sight. Note that each element, a [, 
is a quadratic form in the y 1 s. This form is of rank 1 and can be factored into 
two linear factors, one independent of j } the other independent of i. 

To illustrate the application of these techniques to a fairly complicated prob¬ 
lem, we shall construct and verify a design for the analysis of variance involving 
a regression line. It is known that sufficient conditions for such a design to 
be valid are: 

1. The sum of the quadratic forms be equal to the sum of the squares of the 
variables, and 

2. The sum of the ranks of the forms be equal to the number of variables. 
We shall use the first condition to set up our design. Thus, 


( 6 ) 


= [«$**,*« 



+ 

+ 


K*s -: d:( i. 



XklX 


Rewriting this in the usual notation, we have for our tenative design, 


(7) 


2x*,- = 2 k v - 2i. - £.j + if + 2[£] 2 + 2[x.,- - if 

+ 2 [(nr,/*„)(»• ~ V)f + 2[(fi. — £) - (r«7*/<r„)({/< - y)f. 


In order to determine the corresponding equation for the ranks, we rewrite (6) 
in the form, 

^ - {(• - f)l" - 0! + ©‘(O' + 01 s - 0,' 

First we must determine the rank of the unfamiliar matrix, 


We see that the rank of A% cannot be greater than a — 2 because two linear 
relations exist between the rows, namely, 


1 ‘a’i = 0, since 
y { ai = 0, since 



aa 


2 

V * 
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To show that the rank of A t cannot be less than o — 2, we subtract the elements 
of the first row from the corresponding elements of each of the last a — 2 rows, 
giving, 


Tj-i- 7 —-7—Tv?- 

ai1 («- -) S[)y, 

A— °/« _ j'-V _®A_ »>i,2 

cw$ curl 

Multiplying each element of the second column by —^3 — ^ ^ ~ ^ V* 


and adding the result to the corresponding element of the yth column for j ■* 3, 
4, • • • o, we see that the (o — 2)th order determinant in the lower right-hand 
comer becomes | 4J | which is not equal to zero. Therefore the rank of A» must 
equal a — 2. 

Referring to equation (8), we now write down the corresponding equation for 
ranks using the theorem on uncontracted products. Thus, 


2 Ranks - (o - 1)(6 - 1) + (1)(1) + (1)(6 - 1) + (1)(1)(1) + (a - 2)(1), 


«• ab. 


Hence the quadratic forms in the right member of equation (7) are mutually 
independent and each, measured in units of the variance of the population, is 
distributed as is Chi-square with the appropriate number of degrees of freedom. 



A SYMMETRIC METHOD OF OBTAINING UNBUSED ESTIMATES 
AND EXPECTED VALUES 

By Paul L. Dressel 

Michigan State College , East Lansing f Michigan 

The problem of finding the relationship between moment functions of a 
sample and moment functions of the population from which the sample was 
obtained has, of necessity, received much attention. The problem has two 
parts: first, to find the expected value of a given sample moment function; 
second, to find the estimate of a given population moment function. Thus, if 
m% represent the ith central moment of a sample and m represent the ith central 
moment of the population, the first part of the problem requires that we find 
the mean value of m t for all possible samples of a given size and express it in 
term of the s. The second part requires that we find a function of the m/s 
such that the mean value, taken for all possible samples of a given size, be a 
given m . For the case i = 4 we have the well known results: 

pr „ , (» - 1 )(n 2 - 3 n + 3) , 3(» - 1)(2» - 3) , 

-W + - ~ 3 - M*, 

jji— if i n (n 2 n + 3) 3 n (2 n 3) * 

W W =-*~(4)- ~ -^<4>-"**• 

These results are based on the assumption of an infinite population. In spite 
of the inverse relationship existing between estimates and expected value, the 
expressions above show no simple relationship. This lack of simplicity of rela¬ 
tionship between estimate and expected value is directly traceable to the fact 
that such results are usually obtained for infinite populations. When results 
are obtained for finite populations a symmetry is found to exist which reduces 
to a single problem the two parts stated above. Since this should be evident 
to anyone upon reflection, the main purpose of the present, paper may be con¬ 
sidered as that of indicating one method of demonstrating the result stated 
above as well as showing relationship of this method to material appearing in 
previously published papers. 

Consider a finite population consisting of N items x.\ • •. x N and samples of n 
items taken from that population, the sampling being done without replacement. 
We shall utilize the power product notation of P. S. Dwyer [1; p. 13] 

(?1 • • • Qr) — S x?j 

tijrfiajrf- * 'f*i r 
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to represent a power product formed for the sample and 


( 2 ) 


[?1 • • • q r \ = 2 •••*?: 

to represent like power products formed for the population. An arbitrary 
moment function of weight r of the sample is indicated by 

r! 

(3) /„ _ i T"i (<?0 1 ••• (<Z<) 


toO’ 


toO*'iri! • • • T|! 


and likewise a moment function of the population is indicated by 

r! 


(4) 


1 A. 


1 *• (gi!) r ‘ • • • (?«!)"*i! •••*.! 


tor ••• ton* 


where the summation extends over all partitions of r. 

It now is convenient to express each of the expressions (3) and (4) in terms 
of power products. We shall utilize for this purpose an expansion theorem 
which is the converse of a theorem stated by Dwyer, [1; p. 34] and [2; pp. 37-39], 
which can be proved in a similar fashion. 

This converse theorem follows: 

7/ any isobaric sum of products of power sums indicated by 


(5) 


r! 


( 9l ,)'i ... ( g ,0' , #i! ••• 


tor 


be expanded in terms of power products in a form indicated by 

r! 


( 6 ) 


SB pM... »*• 




• (pi'.r ■■■ (p. o’* t.i [ p *' *’* v: ' ] 

then the coefficient B r of the power sum [r] is given by 


(7) 


B r = 2 


r! 


(pxiri... (p.o'-in 

and the coefficient B ri ... rm of [nr 2 • • • r m \ is 

(8) B ri r 2 ...f m = BrjJSrj ••• B Tm 


where the barred product indicates a symbolic multiplication by suffixing of sub¬ 
scripts. 

This is exemplified by 


B32 = B2B2 = (As + 3A21 + Am) (A 2 + An) 


= A 82 + Asn + 3A221 + 4 Asm + Amu. 

Using this theorem the moment functions (3) and (4) are easily expanded in 
terms of power products. In this latter form the expected value of the sample 
moment function is easily found by utilizing the fact that 


TP. 


(• * • g*)^ _ [gi # ■ 
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Now if the expected value of the sample moment function be equated to the 
population moment function (both being in power product form) we obtain a 
set of equations connecting the coefficients of a sample moment function and a 
population moment function. Since either the coefficients of the sample mo¬ 
ment function or those of the population moment function may be assigned 
and the others solved for, this set of equations enables one to solve two problems. 
First, we may find unbiased estimates—moment functions of the sample such 
that their expected value is some preassigned population moment function. 
Second, we may find expected values—moment functions of the population such 
that they are expected values of some preassigned sample moment function. 
From the symmetry of this set of equations, we shall see that any result ob¬ 
tained from the system has, through the symmetry, a dual role. 

The foregoing discussion may be clarified by an example. Let A*[2] 4- Au[l]* 
be the population moment function. In terms of power products this becomes 
(A* + An)[2] + AufllJ. The sample moment function o»(2) + Ou(l)* becomes 
in terms of power products (a* + an)(2) + au(ll) and its expected value is 

<s> 

(a* + au)[2] + anlll]. 

By equating this to the population moment function above we obtain 

(2) _ *r(2) a 

n au — Jy An , 

tc(o* + On) = N(A t + An), 

and the symmetry of the system is apparent. 

If 

_ _ N <0 _ 1 

P< N«>' T< n«> ^ 

the solutions of the system are 

On = t%A\\ , An = pian , 

(9) 

Of = TiAt + (ri — ri)i4ii, As = PiOj + (pi — P2)an. 

In a similar manner if we use moment functions of weight 3 we begin with 
A,[3] + 3A*i(2][l] + Am[lf, 

°*(3) + 3o*i(2)(l) + aui(l)*, 

and obtain the system of equations 

w <8) am =* N^Aui 

n (2) (<ki + Om) = N (i) (Ati + Am) 
n(a* + 3on + am) — N(A% + 3 A%\ + Am) 
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with solutions 


( 10 ) 


Am 

An 

At 


P*Om , 

PtOti + (fit — Pt)om , 

PiOi + 3 (pi — Pt)citi + (pi - 3 p* + 2 pt)oui. 


The solutions for the a’s in terms of the A’s are obtainable from the given results 
in on obvious manner. 

If we use the Carver functions [3; p. 104] 


Pi = Pi, 

Pt — Pi — Pt, 

Pi — Pi ~ 3p* + 2pa , 

Pi — Pi — 7pt + 12p» — 0p4, 


Pn — Pt 

Pt l — Pi ~ Pi 

Ptt = pi — 2pi + pt 


Pi* * pt, 


or in general 

(11) Pr= Ep < E(-1V" 1 

1-1 

and 


r\(t — 1)1 


(pi!)' 1 ••• (p«0 Ca iri! ••• it.! 


Pfirf-r, — Pf\Pr % Pr t 

where the double barred product indicates a symbolic multiplication by addi¬ 
tion of subscripts exemplified by 

Put = PiP% = (pi — 3pi + 2pa)(pi — pi) 

= Pt — 4ps + 5/>4 — 2p§; 
the results (9) and (10) may be written 

-da = Pndii f A 3 = Pidt + 3P*Osi + PsOm, 

A* = Pio, + Pta u , An = PnOti + Pnfliu > 

Am = Pm% • 

Similarly for weight 4 we obtain 


a 4 - 

PiOi + 4 P*aji 

+ 3 Ptait + 6Pto»n + P4O1111, 

Aai - 

PuOji 

+ 3 PtiOtu + Paiflmi, 

Att 30 


Piia» + 2 PtiOtu + Pttami, 

Am = 


Pma*u + Piuflmi 1 

Ami • 


PluiOnii 
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In general 

(12) At — SP*,+ir,H--Hr, 


(pi!) Tl (ps!)* 


(p,!)'*inl 




(13) A — A ri Ar 2 ••• A rm f 

where as before the barred product indicates a symbolic multiplication by 
suffixing of subscripts. 

If in 






;'<«i0'‘-..(«.0'‘in!...w,! vvw 
(-l)' ,+ '* + ” + '* ( T 1 + «+-.. + r, - 1)1 


n T,+I,H-+1, 


the moment function of the sample which is thereby represented is the Thiele 
seminvariant l, of the sample. If the A ’s are solved for by means of the appro¬ 
priate set of equations the expected value of l r is found. Thus we find 


N*n m 

Elk] = 

A l) n 

1 y* n w n «»w* 

(n - NKNn - 6)*, 


E[l\] = 


N*n w 


.7=4 :X * ~ vS>^i (" - WWV “ 71 “ N ~ !)«., 


vV*> e 5ArV 8) 

a« <« - »W” -“>*. 

JVV W JV* n ( *> 

«« = ^X,X* - ( n - N)(Nn -n-N- 5)*, 
where the k system of seminvariants used here is defined by 

= S £ (~1)‘ HiUtr-i, 

a?) 2 ;- )*' x 

Ksr+1 ‘ h (_1) \i + r)r+7+i**»*»- 


By virtue of the symmetry noted earlier it follows that the estimates of the 
Thiele seminvariants and products of these seminvariants of weight < 5 are 
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obtainable from the last results by replacing E by ET l (estimate of), by hi 
U by \i , and N by n. In this manner we find that Li , the estimate of X 4 is 

, n 4 \r< 4 > M i 2 ) n 2 

as) u - e-'m + ~ n)W "" 6> *- 


It is of some interest to note in the results (16) above that in those expected 
values or estimates which contain more than one term the factor N — n occurs 
in the second term. This, and the form of other coefficients involved in the 
terms, shows that as the sample size approaches the population size the sample 
seminvariants approach the population seminvariants. Another characteristic 
of such results as those given in (16) is that infinite sampling formulas are easily 
obtainable therefrom. Thus if in Li given in (18) N —> we find 


Li 


^4 3 

71 * | Tt * 

i 4 1 - 77, k 4 

n (4) n (4) 


n a (n + 1 ) 


nu 


3n*(» - 1) 2 

mi, 


w 


( 4 ) 


the first, of these forms checking the result given by Dressel [4; p. 45] and the 
second form being identical with that given by Fisher [5]. 

The results exhibited above for finite sampling may lead to a mistaken idea 
about the simplicity of the results. Simplicity decreases rapidly as the weight 
increases. Thus for weight 6 we find 


E[k] = 


N*n (t) 
N W) n* 


X« + 


2N*n fi) 

N U) n* 


(n - N)(Nn - 20)[8 M6 - 15 ww + 10 m! ~ 45m!] 


(19) 


N z n a) 

+ (n - N)[Nn(n + N) - 12 nN + 60] 

• [11m« + 105m4M2 — 50 m! + 60m!] 

(AX[2 (2) 

Sxr-i (« ~ N)[Nn(N 2 + nN + n 2 ) - UnN(N + n) + 71 Nn - 120] 
N w n 6 

, 10 Nn™ , An , 6n W) , ...... , 2n® , . n \ 

A T(4> n b n ^ N w n b W N)(N + n 5) 


Again by letting N —► qo infinite sampling results are obtained. Much of this 
last result vanishes in that case. 

It has been demonstrated that the k system of seminvariants are invariant 
underestimation in the case of infinite sampling [4; p. 53]. It is therefore of 
some interest to note that this system also possesses the property for finite 
sampling without replacement. The proof of this is quite simple. Denote the 
estimate of k< by Ki and the fundamental relations are 


K 2r - 



K ^ 1 ~ **+»• 
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These expressions hold for any n and hence for a population of N. Let Kt r and 
K'tr+i denote functions corresponding to K», and Kir+i but with population 
moments replacing sample moments and we have 


K > = ** 
iV8r ^(2) * 2r > 


Kl 


Jr-fl 


N* 

ltJr+1 • 


Since the power product mode of formulation of K*. and Kir+i insures that 
E[K ir ] - Ki, E[K ir+1 ] = Ki +i 

it follows that 


«*« = *[£ K 


, _ N’ 

lr ” tfa) Kir > 


or 


E[k tr ] 


n w N 2 

n*N™ **' 


Similarly 

pr , 

•fil«2r41J Kir+1 ’ 

thus establishing the theorem stated above. 
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DETERMINATION OF SAMPLE SIZES FOR SETTING 
TOLERANCE LIMITS 

By S. S. Wilks 

Princeton University, Princeton , N. J. 

1. Introduction. In the mass production of a given product or apparatus 
piece-part, Shewhart 1 has discussed a practical procedure for detecting the exist¬ 
ence of assignable causes of variation in a given quality characteristic of the 
product as measured by a variable x. For example, x may be the thickness in 
inches of a washer or the tensile strength in pounds of a small aluminum casting 
made according to a given set of specifications; x varies in value from washer 
to washer or from casting to casting. Now suppose assignable causes of vari¬ 
ability in x have been detected by Shewhart’s procedure and have been suffi¬ 
ciently well eliminated by making appropriate refinements in the manufacturing 
process so that for all practical purposes the remaining variability may be con¬ 
sidered “random,” thus allowing us to assume that we have a statistical universe 
U in which x is a random variable with some distribution law f(x). f(x) is, in 
general, unknown and cannot be determined until long after the refined manu¬ 
facturing operation has been under way. Two types of situations arise in prac¬ 
tice, one in which x is a discrete variable taking on only certain isolated values 
as for example 1, 2, 3, • • • , etc. with corresponding probabilities p(l), p( 2), . • • , 
the other being that in which x is essentially a continuous variable over some 
range with a corresponding probability density function f(x). In this paper we 
shall consider the latter type of variable. 

The problem now arises as to how we should calculate a tolerance range 
(Li, Li) for x from a sample, and how large the sample should be in order for 
the tolerance range to have a given degree of stability. More specifically, for a 
given method of calculating tolerance limits , how large should our sample be in order 
that the proportion P of the universe included between L\ and Lt have an average 
value a, and will be such that the probability is at least p that P will lie between 
two given numbers , say b and c ? For example, if a tolerance range is obtained 
by using a truncated sample range, that is by letting L\ be the greatest of the r 
smallest values in a sample and Li the smallest of the r largest values, r being 
chosen so that E{P) = .99, how large should the sample size, say n, be in order 
for the probability to be .9 that P would lie between .985 and .995? A similar 
question can be asked when the setting of only one tolerance limit is under 
consideration. 

1 W. A. Shewhart, Economic Control of Quality of Manufactured Product , D. Van Noa- 
trand Company, New York, 1931. 
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2. Tolerance ranges from truncated sample ranges. Suppose that nothing is 
known about the distribution function f{x) except enough to enable us to assume 
that it is continuous. Let a be the average value which P is to have, and suppose 
a sample of size n is drawn from the universe U so that [(1 — a)(n + l)]/2 r, 
say, is a positive integer. Let Xi , x 2 , • • • , x n be the sample values of x arranged 
in order of increasing magnitude. Let L\ = x r and L 2 = £ n ~r+i. The distribu¬ 
tion law, say g(P) of P the proportion of the universe included between these 
values of L\ and L 2 is given by 


(1) g(P) dP « 


r(n + 1) 


r[o(n+ l)]r[(l - o)(n+ 1)] 


pa 


(n+l)-l 


(1 -P) 


(1—o)(n+l)- 


l dP. 


This follows at once from the joint distribution law of x n and £ n _r+i which can be 
derived as follows: Consider the x axis as being divided into k mutually exclusive 
intervals 7 X , / 2 , • • • , Ik with p x , p*, • • • , p k as the associated probabilities 

(Pi = 1 ) • In a sample of size n the probability that n x , n* , • • • , n k 

} i / 

f Ui n) values of x will fall into 7i, / 2 , • • • , h respectively is given by 
the well-known multinomial distribution law 


( 2 ) 


n\ 


ttiiri*! • • • n k \ 


P\'pV 


Pk k - 


To get the distribution of x r and x n -r +1 we take k = 5 and for 7 X , 7 2 , • • • , h 
we take the intervals (—«, x r )> (x r , x r + dx r ), ( x r + dx r , x n - r + x ), (x w -r+i, 
x„_ r+ i + dx n -r+ 1 ), (x»_r+i + dx n -r+ 1 , <*>) respectively. The values of p x , p 2 , • • • , 
are the integrals of f{pc) dx over these five intervals respectively and the values 
of rii, Ui , .. - , n 5 are r — 1, 1, n — 2r, 1, r — 1 respectively. By substituting 
these values of the p 's and n’s in (2) and neglecting terms of order higher than 
dx r dx n ~r +1 the probability element for x T and a*„_ r +i is found at once to be 2 


(3) 


[(r 


-■ m rn tA C™*)' (Cj (x)dx ) ’ 


a *n~r + l \w-2r 

f(x) dxj /(x r ) /(x„_ r+l ) dx, dx„_r+i. 


Now let f f(x ) dx = u, I f(x) dx = v, then since du = f(x,) dx, and dv — 

J—CO Jx n - r +i 

—f(x n - r+i) dxn-r+i, the probability element of u and v may be written as 


(4) 


r(n +1) _ 

r 2 (r)r(n - 2r + 1) 


u r - l v r ~\l — u — v) n ~ 2r dudv, 


* P'or a discussion and a rather complete bibliography of the probability theory of “ex¬ 
treme values” such as x, and x n -r,i see E. J. Gumbel, “Lcs valeurs extremes des distribu¬ 
tions statistiques,” Annates de I’Institut H. Poincare (1935). 
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the region of u and v of non-zero probability being the triangle bounded by the 
u and v axes and the line u + v = 1. Making the change of variables 
1 — u — v = P and u = Q, integrating with respect to Q, and setting r = 
(1/2)(1 — a)(n + 1) we find the distribution of P, the proportion of the uni¬ 
verse included between x r and x„- r +i to be (1). It should be remarked that even 
if Li and 1^ are obtained by asymmetrical truncation by taking Li = x 9 , L* = x t 

f(x) dx remains unchanged. 

Thus for a given p, by taking Li = x s and L% = x t where f — s = n — 2r + 1 = 

a(n + 1), and choosing the smallest value of n for which / g(P) dP > p 

Jb 

and such that (1 — a)(n + 1) is a positive integer we have provided the answer 
to the italicized question for one method of calculating L\ and La; a method 
which is valid for any unknown continuous distribution f(x). 

As an example, suppose we take a = .99, b = .985, c = .995 and p = .99. 
The size of sample required is found to be 1000 (999 to be exact). In fact in 
this case the probability of P being between .985 and .995 is .992. In this 
example, we may therefore make the statement that if x is a continuous variable 
under statistical control, and if samples of size 1000 are taken, the tolerance 
limits Li and L 2 taken as the fifth smallest and fifth largest values of x in the 
sample respectively, will, on the average, include 99% of the universe between 
them and furthermore, the tolerance limits calculated in this way for samples 
of size 1000 will, in about 99.2% of the samples, include between 98.5% and 
99.5% of the universe between them. 

If Li and L 2 are taken as the smallest and largest values of x in the sample 
respectively (corresponding to r = 1, i.e. sample range with no truncation), 
then in samples of size 1000, these tolerance limits will, on the average include 
99.8% of the universe between them and the probability is .996 that L% and La 
will include at least 99.5% of the universe between them. If the largest and 
smallest values of x in samples are used as tolerance limits and if we wish to 
state that the probability is .99 that such tolerance limits will include at least 
99% of the universe, the size of sample required is 660. If the probability is 
lowered to .95 of including at least 99% of the universe, with such tolerance 
limits, the size of sample required is 130. Engineering statisticians 8 have 
pointed out on basis of practical experience the need of using samples of 100 to 
1000 on even more cases in order to set tolerance limits which will include at 
least 99%. of the universe w r ith a satisfactorily high degree of certainty. The 
examples we have given based on sizes 1000, 660 and 130 will indicate the degree 
of stability to be expected for tolerance ranges for samples in this range of sizes. 
The degree of stability of the tolerance limits for samples of the size range 500 
to 1000 appears to be of about the order of that demanded by the engineering 
statistician. 

*Cf. W. A. Shewhart, Statistical Methods from the Point of View of Quality Control , The 
Gaduate School of the J.S. Department of Agriculttre, Washington (1939). P. 63. 
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In some cases it may be desirable to determine the size of samples so as to 
control the tolerance limits Li and L* individually, that is so that the probability 
is at least p that the proportions of the universe contained in the tails of the 
distribution cut off by L\ and La are in both cases between two given numbers, 
say d and e. In this case we would determine the least value of n so that 

(5) / / h(u, v ) dudv > p 

Jd Ja 


where h(u , v) dudv denotes the function given by (4). For example, suppose 
p = .99, d = 0 , e = .005. r = 1 . The size of the sample needed is 1060. 
Thus in samples of size 1060, the probability is .99 that Li and La taken as 
the smallest and the largest values in the sample respectively will cut off tails 
of the universe such that each tail will include not more than 0.5% of the universe. 

If it is desired to set only one tolerance limit, say L x , then the distribution 
of u would be used. This can be found by integrating (4) with respect to v. 
The distribution is 


( 6 ) 


r(n + 1) 
r(r)r(n - r + 1) 


u r ~ x (1 


- u) n ^du. 


The probability p that the proportion of the universe in the tail which will be 
cut off by Li is between d and e is given by integrating the expression (6) from 
d to e. The value of n required to obtain any given value of p can then be 
determined. For example, in the case where p = .99, d = 0, e = .005, r = 1, 
the size of the sample needed is 920. 


3. Tolerance range for a normal universe. The method of setting tolerance 
limits discussed in Section 2 assumes nothing about the distribution /(x) except 
that it is continuous. If /(x) can be assumed to have a given functional form 
involving unknown parameters, methods based on the theory of statistical es¬ 
timation and having greater efficiency than those already discussed could be 
used for setting tolerance limits. We shall not go into a general discussion of 
such methods here although it does appear desirable to consider one very im¬ 
portant example of the application of the methods. Suppose/(x) can be assumed 
to be a normal distribution function with unknown mean m and variance a 2 . 

n 

In a sample of size n let x be the sample mean and let s 2 = (x,- — xf/{n — 1). 

i 

Let us consider as tolerance limits L[ and La the quantities x db Jcs . The pro¬ 
portion P f of the universe included between these limits is 

(7) P' = =~ / e H(x-m)»/.* dx _ 

We wish to determine Jc so that E(P') = a. It can be verified by straight¬ 
forward analysis that E(P'), defined by / / P'/(£, s ) ds dx, has the value 

J-—00 Jo 

T(n/2) /•' dx 4 _ Ua /— 

V*(n - l)T((n - l)/2) L (1 + *7(n - !))»'*' V n+ 1 


( 8 ) 
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where /(£, *) is the well-known distribution of £ and « given by 

fn\ V»(« — „—}(»(*—m) ■+(■»— 

W fVVxr((n-l)/2) 

Therefore the tolerance limits L[ and L\ which will include, on the average, 
a proportion a of the universe between them are 

(10) £ d= t a V(n + l)/n«* 

where t a is the value of t for which the integral in (8) has the value a. The 
value of t a can be found from Fisher’s stable for n — 1 degrees of freedom, and 
for certain values of a including .99, .95, etc. and for values of n up to 30. Al¬ 
though the tolerance limits (10) will include, on the average, the proportion a 
of the universe between them, we must now investigate the size of sample 
needed to obtain a given degree of stability of P'. The exact distribution of P' 
seems to be too complicated to be of any practical value. It is not difficult to 
verify that to within terms of order 1 /n, the variance of P f is given by 

(11) o>' = tle'^/iirn). 


The variance of P, the proportion of the universe included between x r and 
x n _ r +i, to within terms of order l/n is given by 

(12) <rp = a(l — a)/n. 


For a large sample of a given size, say n = 100 or more, a simple comparison 
of the stabilities of the two tolerance ranges ( x T , x n - r +\) and (x =fc t a y/(n + \)/n • s) 
can be made by comparing a 2 P and <r 2 P >. For a * .99, the efficiency ratio //o> 
is .28 indicating that for large n and when the universe is normal, samples of 
size .28n have the same degree of stability in setting tolerance ranges (10) as a 
sample of size n has when ( x r , z„_ r +i) taken as the tolerance range. The same 
thing may be viewed in another way: The fact that the range of values of P' is 
0 to 1 suggests that we may be able to get a fairly close approximation to the 
true distribution of P' by fitting a Pearson Type I function of the form 


(13) 


P( a + &) p/a—1/-I pA0—l 

r(a)m K } ’ 


determining a and p by equating the mean and variance of the distribution (13) 
to the mean and variance of P' respectively. Accordingly we find 


(14) 


a =» [o 2 (l — a) — atr* ']/ap' 

/S = [a(l — a) 2 — (1 — a)o> ']/<*>'. 


Thus it will be seen from (14) that in order for the fitted distribution (13) to be 


identical with the distribution (1) a sample of only 


tie""* 


(n + 2) cases is 


?ra(l — a) 

needed. 

In case only one tolerance limit is to be set, e.g. x — tV (n + l)/n«s, the 
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proportion, say u\ of the universe which will be included in the tail has mean 

value (1 — a )/2 and variance ^ ^ e~** (approximately) for large n. The 

4tw 

ratio of this variance to that of u , which is approximately (1 — <x)/An for 
large n, gives the efficiency of using x T for the lower tolerance limit in case of a 
normal universe. For example, if a = .99, the efficiency is .18. 

It is perhaps appropriate here to point out the distinction between confidence 
limits and tolerance limits. It is well-known that in a sample from a normal 
universe with mean m the probability is a that the confidence limits £ ± t a 8 
will include the population mean m between them. The tolerance limits 
x ± t a y/(n -f l)/n-s, on the other hand are used to estimate the middle 100a% 
of the universe. Although the tolerance limits x dt t a y/(n + 1 )/n-8 are much 
more stable for a given sample size than those given by x r and £„-.r+i, in case 
of a normal distribution, it should be emphasized that in case of even slight 
non-normality, particularly when skewness is present, the former pair of limits 
are apt to give very erroneous results with reference to the proportion of the 
universe included in the tails. Confidence limits estimating m are probably 
much less sensitive to skewness than tolerance limits estimating the middle 
100a% of the universe, particularly when a is nearly unity. 

Another important aspect of the problem of setting tolerance limits is the 
following: Suppose small samples of a given size are taken from a universe 
under statistical control. How many of these small samples should be taken 
as a basis for determining tolerance limits Li and L% of some function, say g , 
of the samples (e.g. the sum of the measurements in each sample) so that the 
proportion of samples in the universe of such samples having values of g between 
Li and L% will have a given mean with a given degree of stability? One obvious 
approach to this question is to consider a universe of samples in the same manner 
in which we have considered a universe of individuals throughout the present 
paper. This approach, however, does not make very efficient use of the observa¬ 
tions, but we shall not enter into a treatment of the problem here. This problem 
and various related problems in the statistical methods of mass production 
remain to be studied. 


4. Summary. A method based on truncated sample ranges for determining 
size of sample required for setting tolerance limits on a random variable x having 
any unknown continuous distribution f(x) and having a given degree of stability 
is given. A method for setting tolerance limits corresponding to a given degree 
of stability in case f(x) is normal is discussed and a comparison of the stabilities 
of the tolerance limits set by the two methods in the normal case is made. 
Illustrative examples of the methods arc given. 



ON A CERTAIN CLASS OF ORTHOGONAL POLYNOMIALS 


By Frank S. Beale 
Lehigh University, Bethlehem, Pennsylvania 

Introduction. E. H. Hildebrandt has demonstrated the following theorem 1 * : 
If y is a non-identieaUy zero solution of the Pearsonian Differential Equation, 


( 1 ) 

( 2 ) 


1 dy op + aix 

y dx bt + bix + bix s 


D n ~* £ 

y dx 


- (Lfy) = P K (k, x), 


N 
D ’ 


Oi, bi real, then 


n, k integers, n > 0, is a 


polynomial in x of degree n at most. Hildebrandt has obtained various relations 
connecting the P n (k, x) and their derivatives as well as a recurrence relation. 

If in (2) we set k = n there results from a proper choice of N and D in (1), 
the classical Hermite, Laguerre, Jacobi and Legendre Polynomials. Many 
properties of these classical polynomials have been obtained by numerous 
investigators.* 

One of the most important of these properties is that of orthogonality which 
can be stated as follows: Consider a sequence of the classical polynomials $<(*) = 
x' — /S,x’ _I + • • • . There exists an interval (a, b) finite or infinite and a unique 
weight function ^(x), monotonic non-decreasing over ( a, b) such that, 


(3) 


r 


$m(x)<P n (x) d+(x) 


0 , 


for n 9 * m. 


In the future we wiU refer to the type of orthogonality given by (3) with \fr(x) mono¬ 
tonic non-decreasing as orthogonality in the restricted sense. In order to determine 
whether a given system of polynomials is orthogonal in the restricted sense we 
have the following theorem: 3 

Theorem 1 . In order that the sequence of polynomials $i(x) = x % — S #** 1 + 


1 E. H. Hildebrandt, “Systems of polynomials connected with the Charlier expansions, 
etc.,” Annals of Math . Stat ., Vol. 2(1931), pp. 379-439. 

* For an account of these properties as well as an extensive bibliography the reader can 
refer to one of two treatises viz.: J. Shohat, Thtorie GSntrale des Polynomes Orthogonaux de 
Tchebichef , Memoriale des Sciences Math^matiques, Fascicule 66, Paris, Gauthier Villars, 
1936. 

Gabor Szego, Orthogonal Polynomials , Am. Math. Soc., Colloquium Publications, Vol. 
23, 1939. 

3 J. Shohat, “The relation of the classical orthogonal polynomials to the polynomials of 
Appell,” Am. Jour, of Math., Vol. 58(1936), pp. 454-455. 
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• • • , i « 1, 2, 3, • • • with real coefficients be orthogonal in the restricted sense it is 
necessary and sufficient that there exist a recurrence relation , 

(4) $i(x) « (x - i(x) - t(x), $o = 1, *i - * - Ci, 

c,, Xi con«f. with all X< > 0, i > 2. 

With Shohat 4 * we mil say that a system of polynomials $i{x) = x x — S#**"" 1 + 

• • • , i = 1, 2, 3, • • • , real coefficients is orthogonal in the general sense if 
there exists at least one weight function yp(x) } of bounded variation over (a, b) such 
that (3) is satisfied . In connection with generalized orthogonality we have the 
following theorem: 4 

Theorem 2. In order that the system $>,(:r), i = 1, 2, 3, • * • be orthogonal in 
the general sense it is necessary and sufficient that relation (4) be satisfied with all 

\i 9* 0. 

It is the purpose of this paper to investigate the orthogonality properties of 
the general polynomials P n (n, x) given by (2). In Part 1 a general recurrence 
relation is derived which applies to all the polynomials P„( k, x). In Part 2 all 
the different types of orthogonal polynomials P n (n, x) are determined by making 
use of the general recurrence relation derived in Part 1. We also show, follow¬ 
ing lines laid down by Hahn 6 , that the only systems of polynomials with simple 
zeros which are orthogonal in either the restricted or the general sense and whose 
derivatives are orthogonal in either sense are the systems considered in Part 2. 


1. The general recurrence relation. From (2) we can write, 


,n — k— 1 ^n—1 


—, [D-D 1 

n—1 1 


,*-1 


nn-fc-l 7fi—l r\i 

(5) P_i(fc, x) = " - D k y Hs ~ . 

y dx n 1 y dx n 

Apply Leibnitz Formula to the right side and make use of (2). There results, 


P n -i(k, x) = P„-i(k - 1, x) + (n — l)D'P n -t(k — 1, x) 
W (n - l)(n - 2) 


+ 


1.2 


D"DP„-t{k - 1, x). 


From Hildebrandt’s paper we have, 6 

(7) P„+i(fc + l,x) = [N + (k + 1 )D')P n {k, x) + n[N’ + (k + 1 )D"\DP^(k, x). 

Decrease k and n each by one in (7) and obtain a relationship which we number 

(8) . Again decrease n by one in (8) and get a relation which we number (9). 


4 J. Shohat, “Sur les polynomes orthogonaux g6n6ralis68,” Comptes Rendus, Vol. 207 
(1938), p. 556. 

* Wolfgang Hahn, “Uber die Jacobischen polynome und zwei verwandte polynomklas- 
sen,” Math. Zeits ., Vol. 39(1934-35), pp. 634-638. 

* E. H. Hildebrandt, loc. cit. p. 407. 
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From ( 8 ), (7), ( 8 ) and (9) eliminate P*_i(*, x), P«_*(* — 1, x), and P*_*(* — 
1 , x). There results, 

(10) [2N f + (2* - n + 1 )£"][AT' + kD"]P^(k + 1 , x) 

= {[2N’ + (2k - n + 1)D”](N' + *£"][# + (* + 1 )Z>'] 

+ n[N' + (* + l)D"][2N'D' + kD'D" - ND"])P»(k, x) 

+ n[N' + (k + 1)D"]{2(N' + kD"fD 

- (N + kD')(2N'D' + kD'D" - ND"))P n -,(k - 1 , x). 

In (10) decrease n and k each by one and replace N and D by their values from 
(1). Thus we get, 

(11) [a, + (2k- nM 0l + 2 (k - 1 )b t ]P n (k, x) 

= {[ai + (2k - 2 ) 6 * 1(01 + 2 * 621 ( 0 , + ( 2 * - 1 ) 6 ,]x 

■b [01 + ( 2 * — 2 ) 6 *](ai + ( 2 * - n) 6 ,][ao + * 6 J 

+ (n — l)(ai + 2*6,][oi6i + (* — 1 ) 616 * — Oo6*]jP«-i(* — 1, x) 

+ (n - l)[a, + 2A6*] (6o[o, + (2* - 2)6*] s 

~ («o + (* — 1)6 i][oi6i + (* — 1)6,6* — a«6*]}P B _*(* — 2, x). 

In this recurrence formula the P B (*, x) have in general a coefficient of x n dif¬ 
ferent from one. Polynomials which have one for the coefficient of x n we will refer 
to in the future as normalized. Let us now transform (11) for normalized P n (*, x). 
Theorem 1 deals with polynomials normalized in the above sense. Let us write, 


P„(*, x) = a„,kX n - b n x n ~ l 4- • • • . In (4) set, 4>*(x) = P„(*, x)/a Kik . 
Thus we get, 

(12) P„(k, x) = (A n x - Bn)Pn-i(k - 1, x) - 7 nP«_*(* - 2, x) 

where 


7n “ 


An, k ^ 
- A n , 

a n ~t,k~t 


A n * , and B* 

0»-l,*_i 


_±±_ Cn 

Ctn-l.fc-l 


Relation (12) is essentially of the same form as (11). Each of these is to be 
reduced to form (4). 

From a previous paper by the author 7 we have, 

(13) PU(*. *)-(* + W + h(2k - n)D"]P»(k, x). 


n — 1 successive applications of this relation give us, [Po(A;, x) m 1], that the 
coefficient of x n in P n (k, x) is, 


7 Frank S. Beale, “On the polynomials related to Pearson's differential equation,” 
Annals of Math . Stat ,, Vol, 8(1937), p. 207 (2). 
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(14) a n ,h ** II [ai + (2 k — n + 1 + 06J. 

♦-o 

By employing (14) in (12) we see that (12) or (11) reduces to form (4) where, 

[oi + (2k — 71)62] teo + kb\] 

[ai + 2^62] [ai + (2k — 1)62] 

— ( n — 1) t a i 5 i (fc ““ l) 5 i 5 * ““ U062] 

[ai + (2fc — l)6s][ai + (2& — 2)62] 

[ai + (2k — n — l)bs] {60 [«i + (2ft — 2)62]* 

( n _ 1 ) _ [oq + (ft ~ l)6i][ai6i + (ft * 1)6162 —• 0062] ( 

’ * [ai + (2k - 3)h][ai + (2k - 2)b*)*[a l + (2k - 1)6.] 

Equation (16) together with Theorems 1 and 2 can now be applied to the poly¬ 
nomials P n (k f x). 

From (14) it is seen that P n (k, x) is of degree n provided that none of the factors 
of the product vanishes . This condition we assume to hold here for all n. 

We can now t obtain a recurrence relation for the gth derivatives of P n (ft, x). 
A repeated application of (13) leads to, 

(17) ~ P n (k, x) - x) n (n - i) [ax + (2k - n + i + 1)6,], 

dx 9 i-o 

where P n (k, x) is not normalized in the above sense. By considering the right 
side of (17) together with (14) we see that (17) can be divided by 
0-1 

— Q%k n (n — i) [a\ + (2k — n -f- i -f- 1)62] 

t-0 

and thus normalize the polynomials on both the right and left sides of (17). 
Consequently the recurrence relation for normalized d 9 [P n (k, x)]/dx q , n = 
0, 1, 2, • • • , is identical with the recurrence relation for normalized P n ~ q (k f x) 
as given by (4), (15) and (16) when we replace n by n — q in these latter. 


(15) 

(16) 


2. The different types of orthogonal P n (n , x ). Suppose first that 6 2 5^ 0 in 
(1). A transformation on x with real coefficients can be affected which changes 
(1) into either, 

Idy _ (a - 0) + (-<* - P)x 
y dx 


(18) 


(19) 


1 - a* 
1 dy _ —2mx — q 


or 


y dx 


o 2 + x 2 


(A) Equation (18) together with (2) for k = n defines the generalized Jacobi 
Polynomials (normalized in the above sense), 

J n (x, a, p) = -L (1 + *)-(l - a if* * [(1 + x) n+ “(l - *) B+ '] 

Qn,n aX 
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where l/on,* is given by (14). If in (16) we set k -» n and make proper replace¬ 
ments for constants as (18) and (1) show we have, 

' _ A/ n _ j\ (a 4- 0 + n — l)(q + n — l)p9 4- n — 1) 

(20) "" '(«-M + 2n-3)(a + 0 + 2n-2)*(q + 0 + 2n-1)’ 

n ^ 2. 


From Theorem 1 and this value of X» we conclude that ifq> — 1, 0 > — 1, 
the sequence { J*{x, a, 0) ) is orthogonal in the restricted sense—a well-known 
result. From Theorem 2 we can similarly conclude that if neither a, 0, nor 
(a + 0) equals —j, j a positive integer, the sequence { J*(x, a, 0) } is orthogonal 
in the general sense. 

(Ai) If in (18) we set a = 0 = 0 we obtain a differential equation which 
together with (2) for k — n leads to the Legendre Polynomials, (normalized in 

f rf* 

above sense), P n (x) = ^ (x 2 — 1)". Setting a = 0 = 0 in (20) leads to 

X n = /nk — > -—, n > 2. Thus from Theorem 1 we conclude that the 

(2 n — 3)(2n — 1) 

Legendre Polynomials are orthogonal in the restricted sense, a result well known. 

(B) Equation (19) together with (2) for k = n leads to a class of polynomials 
(normalized in above sense), mentioned by Romanovsky. 8 

R n (x f m , q, a) = — (a 2 + x 2 ) m exp (- tan" 1 f"(a 2 + z 2 )"“ m exp tan" 1 -1 
o»,n \fl fl/ dx n L o a J 


where again l/a n , n is given by (14). In (16) set k = n and make the proper 
replacements of constants and, 

= n - 1 (2m - w + l){4a 2 (m -« + !)* + g 8 } ra > 2 

4 (2m — 2n + 3)(m — n + l) 2 (2ra — 2n + 1)* 


From Theorem 2 it now follows that the sequence {/?„(#, m, g, a)} is orthogonal 
in the general sense if m 5* j/ 2, j a positive integer. There is no set of parameters 
m, g, o which assures orthogonality in the restricted sense. 

In connection with Romanovsky’s note there appear to be several discrepan¬ 
cies. For the weight functions given there under types IV and V, the nth 
moments for sufficiently large n do not exist over the intervals there considered. 
Type V is the special case of type IV for a = 0. Type VI is none other than 
Jacobi Polynomials so that the orthogonality relations given there for this case 
are incorrect. In all three types listed certain of the recurrence relations for 
the polynomials are in error. 

(Bi) We note here one special sub-class of R n . Take m = g = 0 and a = 1 
in (19). We obtain from (2) and (14) a system of normalized polynomials 


analogous to the Legendre Polynomials namely, 4> n (%) 


n! d* 
(2n)! dx* 


(** + d“. 


• V. Romanovsky, “Sur quelqties classes nouvelies de polynomes orthogonaux,” Comp tea 
Rendus , Vol. 188(1929), pp. 1023-1025. 
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It ia easy to verify for these that, 

J ^«(r)^«(x) dx * 0, vi tt, i = ^^ — 1. 


(C) Suppose that in (1), 6j = 0, bi j* 0. A linear transformation with real 

1 dy a — x 


coefficients changes (1) into, 
and (14) for k 
in above sense), L n {x, a) 


-j- - . This equation together with (2) 

y ax x 

n defines the generalized Laguerre Polynomials, (normalized 


( — l) n x~ a e* ~ [a: B+0 e _ *]. 


Setting k = n and making 


proper replacements in (16) we get, X n = (n — l)(a + n — 1), n > 2. From 
Theorem 1 we see that if a > — 1 the L» are orthogonal in the restricted sense, 
a well-known result. From Theorem 2 we can say that if a y* — j, j a positive 
integer, the polynomials are orthogonal in the general sense. 

(D) If in (1), bt = 62 * 0, bo 5* 0 we can perform a linear transformation on 

x with real coefficients and get, - ~~ * hx. This differential equation together 

y ax 

with (2) and (14) gives a set of normalized polynomials (r„ (a:) = ^ -— n e 1 *'*. 

Taking A: = n and making proper substitutions for constants in (16) we get 
= — (n — 1 )/h, n > 2. If A is negative it follows from Theorem I that the 
sequence { G n (x )} is orthogonal in the restricted sense. In fact, G„(x) * H n (x) = 
Hermite Polynomials. 

On the other hand, if h is positive we have from Theorem 2 orthogonality in 
the general sense. In fact, it can be easily verified for this case that, 



e kxtli G n (x)G m (x)dx 


= 0 , 


m 7 * n, 


i =V — 1. 


(E) The only remaining possibility for (1) not so far discussed occurs when 
N m constant and D is linear. In this case it has been shown that P n (k, x) 
of (2) reduces to a constant.® 

E. H. Hildebrandt has shown 10 that the polynomials P„(n, x) of (2) satisfy 
a differential equation of the form, 

(ho + bix + fox*) + [ao + b\ + (fli + 2bi)x]~ 

( 2 i) dx 1 dx 

— n[ai + (n + 1 )b*]y = 0, n = 1, 2, 3, • • •. 

Moreover with the coefficients of dfy/dx 2 and dy/dx in (21) he has shown that 
for (21) to have a polynomial solution of degree n the coefficient of y must be 
of the form given in (21). 


• Frank 8. Beale, loc. cit. p. 209, Theorem I». 
10 Loc. cit. pp. 404-406. 
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From (16) we can say that for k - n and an orthogonal sequence P*(n, x), 
n *= 0,1, 2, • • • we have, 

(22) oj + (n - 1)6» j* 0, 

(23) hotel + (2n - 2)hj]* - [do + (» - l)hJMi + (» - l)bih* — ao&i] * 0, 

where n is an integer > 2. Considering for (21) a solution of the type y «■ 
00 

2 CiX* we readily show that if (22) and (23) are satisfied, (21) possesses for 

t -*0 

each n a single polynomial solution of degree n. Two solutions which differ 
merely by a constant factor are regarded as the same solution. This polynomial 
solution of (21) must be P„(n, x). 

By employing theorems from a previous paper by the author 11 we can show 
that if (22) and (23) are satisfied, the zeros of the polynomials of section IT are 
simple whether these zeros are real or complex. 

Hahn has shown 1 * that if a set of normalized polynomials and their deriva¬ 
tives satisfy a relation of the form (4) with X< ^ 0 and if the zeros of the poly¬ 
nomials are all simple then the polynomials must necessarily satisfy an equation 
of form (21). Since in this paper we have considered all possible values of 
a<, (t = 0, 1), and , (i - 0, 1, 2), which lead to orthogonal polynomials, it 
follows that the only systems of polynomials with simple zeros and orthogonal 
in either restricted or general sense whose derivatives in turn are orthogonal in 
either sense are the systems of section 2. 

11 Loc. cit. pp. 207-209, Theorems I, to I» . 
u Loc. cit. pp. 634-636. 



THE SKEWNESS OF THE RESIDUALS IN LINEAR REGRESSION 

THEORY 

By P. S. Dwyer 

University of Michigan , Ann Arbor , Mich. 

In obtaining the regression of y on x it is customary to show the relation 
between the actual and the estimated y by computing the stan dard deviation 
of the residuals with the use of the formula <r. = <r y \/1 — r 2 . If the errors 
are distributed normally one may estimate the number of values coming within 
one standard deviation, within two standard deviations, etc., of the regression 
line. However these errors are not always distributed normally, and in such 
a case it seems wiser to compute the skewness of the residuals and to use a 
Pearson Type III curve in making the interpretation. The present paper out¬ 
lines a technique for the calculation of as:# which is feasible from a practical 
standpoint. It is based (a) on a cumulative totals method of obtaining the 
correlation coefficient which, at the same time, makes possible the determination 
of the third order moments needed to evaluate the skewness and ( b ) on an effi¬ 
cient ritual for computing the coefficient of skewness from the moments. 

The determination of the normality or non-normality of the residuals is not 
always immediately evident. If the scatter diagram or correlation chart is 
presented, one can make an estimate of the extent of normality but if not, and 
the most modem and efficient computational methods do not utilize the correla¬ 
tion chart, there is no way by which the presence or absence of normality can 
be detected. Some research workers are opposed to the use of the more efficient 
methods (particularly the use of the Hollerith tabulators) because the correla¬ 
tion chart is not presented. Though within limits it is possible to use the 
tabulator to present the correlation chart simultaneously with the values needed 
to compute the correlation coefficient [1], it is here suggested that the computa¬ 
tion of the skewness of residuals, which can now be accomplished quite easily 
from the tabulator runs, may be substituted for the examination of the correla¬ 
tion chart. 

The classical least squares theory makes use of 
(1) e = y — b 0 — bix 

where bo and 61 are the solutions of the normal equations. We note that the 
first normal equation is Se = 0 so that M § = 0 and the residual is a deviation. 
It follows that the skewness of residuals is 

2(y - bp - bix )* 

Nc\ 

104 


( 2 ) 



SKEWNESS OF RESIDUALS 


105 


We wish to compute o, : , without computing the individual residuals. The 
denominator causes us little concern but it seems discouraging to evaluate such 
an expression as 

V - Nbl - b\lx 3 - 36oSy 2 - 3 bilxy* + Sblly 

- 35$6i2* + 36?2x s y - 36?6oSx* + Gbobilxy 

even though the values of b 0 , bi, N, lx, 1y, lx', 1xy, 2 y*, lx 3 , lx*y, Ixy *, 
ly* are available. 

A first simplification is made by summing (1) and dividing by N. We then 
have 

(3) M. = M v - b 0 - biM x 

and by subtracting (3) from (1) and denoting deviations by barred letters, 
we have 


(4) « = V ~ M 

so that the skewness of errors is 

_ ly 3 - Sbilxy + Zb\lxy - b\lx 3 


(5) 


a»:. 


n«: 


This formula can also be expressed as 

/o\ Moj — 36i j5u + 35i/Z»i — bliito 

(6) --ta. - -• 

A similar formula for the skewness of the residuals of x on y is 

Mao — 36iM2i + 3bi mu — bi jioi 

(7) *- k - 6;*.r-• 

For theoretical purposes formula (6) may be put in standard units with 
bi — r ~ , b[ = r — , mm = aao<r* , iki = a»i< rla y , etc. with the resulting 

<7* Gy 


( 8 ) 


<*«:. 


003 — 3ra 12 -f- 3r 2 a 2 i — r 8 oso 
(1 — r 2 ) 3/s • 


As r —> 0, 03 ;, —► os :y just as or, —► g v as r — ► 0. 

Formulas (6) and (7) are of some theoretical importance in that they show how 
the skewness of the residuals is connected with the skewness of the marginal 
distribution. Thus 

as /in —+ 0 , 6i and 6i —► 0 and aa :f —► o$ :y , as.v —► o** ; 
as bi —> oo, oi : , —► — ot : » and as b[ —► oo, a*,' —► —as* ; 
as 6i —► 1, oat, . Similarly as b[ —► 1, as.,' —* a**-* . 
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It is hence possible in some cases to get a good approximation to the skewness 
of the residuals if the regression coefficients and the skewness of the marginal 
distribution are known. 

TABLE I 


Correlation from first order cumulations 


(1) 

(2) 

(3) | (4) | (5) 

(6) 

(7) | (8) | (9) | (10) 

(11) | (12) | (13) 

(14) 

H 



4.00 

3.99 

3.50- 

3.49 

3.00- 

2.99 

2.50- 

2.49 

2.00- 

1.99 

1.50- 

1.49 

1.00- 

.99 

.50- 

.49 

.00- 




\ X 

V \ 


8 

7 

6 

5 

4 

3 

2 

1 

0 





X 

13 

■ 

107 

220 

341 

179 

121 

60 

■ 



4.00 

6 

18 

5 

2 

5 

5 

1 







3.99 

3.50- 

5 

106 

2 

. _ 

19 

29 

27 


7 


1 

1 

673 


3.49 

3.00- 

4 

178 

3 

12 

35 

53 

44 

18 

6 

5 

2 

1503 

1350 

2.99 

2.50- 

3 

270 

3 


20 

55 


33 

27 

11 

8 

2568 

■ 

2.49 

2.00- 

2 

330 

■ 

6 

11 

54 

114 

67 

46 

19 

13 

3714 

IS 

1.99 

1.50- 

1 

173 

1 

1 

5 

■ 

45 

44 

34 

18 

7 

4244 

2993 

1.49 

1.00- 

0 

51 



2 

7 

14 


8 

6 

4 

4399 

2993 



Cy x 

61 

259 

661 


2194 

2578 

IU 

2923 

2993 

12815 

19 



Cx x 

E 

454 

1096 

2196 



4339 

4399 

! 4399 


j 


For actual computation, we use (6) and (7). It has been indicated previously 
how the values Xx, Xy, Xx 2 , Xxy , Xy 2 , Xx 1 and Xy 1 could be obtained with the 
use of cumulations. An illustration used previously [2] is presented in Table I. 
The information was obtained from the Office of Educational Investigations of 
the University of Michigan and gives the University first semester average (X) 
and the high school average ( Y) for 1,126 students entering the College of Litera¬ 
ture, Science, and the Arts in 1928. 

The new origin of each variable is taken at the class mark of the lowest class 
rather than at the class mark of a middle class as is conventional. In this way 
all negative terms are avoided in the computation of the moments. The x’s are 
arranged in descending order from left to right and the y’a in descending order 
from top to bottom. The notation x y is used to indicate the sum of all the x’s 
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having the same value of y. Thus the first entry in column 18 is 5*8 4- 2*7 + 
5-8 + 5-5 + 1.4 ■* 113. The column Cxy is obtained by cumulating the values 
of x t . Similarly y v is the sum of all the y’s having the same value and the first 
entry in column 14 is 18(6) =» 108. The entries Cy v , Cy», and Cx, are obtained 
similarly. 

The entries lx, ly, lxIxy, ly 2 are found in the lower right hand box in this 
position: 


lx 

ly Ixy 
lx lx 8 


ly_ 

V 


The values of lx and ly are obtained from the final cumulations while the value 
of Ixy is obtained by adding the entries in the column above, or, as a check, 
the entries in the row to the left. The value of ly 2 is obtained by adding the 
entries in the row at the left of the box while the value lx * is obtained by adding 
the entries above the box. 

The values of the third order sums are obtained by multiplying the entries 
above the box and to the left of the box successively by 1, 3, 5, 7, 9, etc. Thus, 

lx 1 = 4399 + 3(4339) + 5(4097) + etc. = 102,103, 

Ix'y = 2923 + 3(2809) + 5(2578) + etc. = 63,121, 

(9) , 

Ixy 2 = 4244 + 3(3714) + 5(2568) + etc. = 46,047, 

ly ' = 2993 + 3(2820) + 5(2160) + etc. = 38,633. 

In making the reductions we use ab — cd operations as much as possible. 
We first compute 

A x , y - Nlxy - (lx)(ly), 

(10) A,., = Nix 2 - (lx) 2 , 

A ,* Nlx 2 y — (lx 2 )(ly). 

We note too that 

m = [NA+* - (2lx)(A,.')]/N'; m = (NA,t. y - (2lx)(A,, y )]/N 2 

(ID 

mi* = [NA X , V > - (21y)(A t , y ))/N 2 ; m = [NA v >, t - (2Ly)(A v . v )]/N' 

and finally we get a* ; , or a t: ,> by (6) or (7). 

The general solution is outlined on the left of Table II. We record in Fig. A 
the values given by (9) and in the Fig. B the values resulting from the applica¬ 
tion of (10). The values 2 ly and 2 lx are inserted in Fig. B to facilitate the 
calculation of Fig. C which gives the values of (11). The technique is very 
easily carried out once it is understood. It can be performed with hand calcu- 
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lators but it is ideally adapted to the use of the latest Merchant, Frid6n, and 
Monroe models equipped with automatic positive and negative multiplication, 
so that db-cd operations can be performed with a minimum of effort and a max¬ 
imum of accuracy. Actually the value of "a,” which is the total frequency, is 
the same for many of these operations so that there is further saving if a ma¬ 
chine is used which permits the locking in of a constant in such a way that it 
can be used, without continued key punching, in later ab-cd operations. 

TABLE II 


Abbreviated techniques for computing third order central moments, etc. 

Fig. A. 


N 

lx 

2x* 

lx* 

1126 

4399 

20245 

102103 

sp 

Ixy 

Sx‘j/ 


2993 

12815 

63121 


Si/* 

2xj/* 



10069 

46047 

i 


Si/» 




38633 





Fig. B. 


N 

22* 

Ax,a 

Ax*,X 

1126 

8798 

3444669 

25910223 

2 ly 

Ax,y 

A,*., 


5986 

1263483 

10480961 


HM 




2379645 

7555391 




■3 



13364241 





Fig. C. 


N 


Ax,x 


1126 


3444669 

, 

-1131286764 


A x ,y 




1263483 

685438652 


Ay,y 

iV'flji 



2379645 

944161028 

i 


N'g oi 




803580396 





Fig. D. 


N 

(6.) 

fiiO 

fioi 

1126 

(.367) 

2.717 

—.7925 

(hi) 

fin 

M 21 

(—6f), (-36,') 

(.531) 

.997 

.4801 

(-1.593) 

fi 02 

fin 

<36f), (36,'*) 


1.877 

.6614 

(.846) 


fioi 

<-360, (-6(») 



.5629 

(-.150) 




The values in Fig. D are obtained by dividing the values A„,„, A*,,, and 
A,,, in Fig. C by JV 2 and the values in the diagonal below, NA y — (22y)A„*, 

etc., by N*. The values &i = ^ and b[ ~ ^ can be inserted in Fig. D adjacent 

Mm Moa 

to the N. The value of the correlation coefficient is r = %/bib[ == . ■ . 

Vmmmm 
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We have too, <r, «■ \/fa — bijin and a,< — %/m*> — bjjiu so that the standard 
deviation of residuals is readily computed from the entries of Fig. D. The 
numerator of (6) is readily obtained after entering — 3bi, 3b*, (—bj) in the 
diagonal under the diagonal containing the third moments and multiplying by 
columns. The numerator of (7) is obtained by entering — b \*, 3b“ , — 3b|, 
in the same diagonal and multiplying by rows. The theory is applied to the 
results of Table I and the details are presented at the right of Table II. It is 
to be noted that all values indicated here are the coded values x, y and not the 
original values, X, Y. However, the correlation coefficient and the skewness 
of errors are independent of any such change in unit, grouping errors being 
neglected. 

From Fig. D we see that bi = .997/2.717 = .367, that b{ = .997/1.877 * 
.531 and that r = V(.367)(.531) = .441. In this case we wish to estimate 
college record, x, from high school record, y, so we use b[ = .531 and compute 
-3 b[ = -1.593, 3b? = .846, -b[ l = -.150. It follows that 


-.7925 + (.4801)(-1.593) + (.6614)0846) + (.5629) (-.150) 
[2.717-.531(.997)]*« 


-.334. 


It thus appears that a better picture of the variation of the residuals in this 
case is obtained with the use of a Pearson Type III with a» approximately — J 
than is obtained with the use of a normal curve. It is not necessary, of course, 
to form Fig. D as the results can all be obtained from Fig. C. Thus if we 
multiply the numerator and denominator of (6) by N *, we get entries, with the 

exception of the b’s, which are in Fig. C. Now in this case bi = and b[ = 


At,y 


so that these values can be inserted in the upper left as before. Also the 


powers of b[ can be inserted in the lower right as in Fig. D. We have then 


-1131,286,764 + (685,438,652) (-1.593) + (944,161,028) (.846) 

+ (803,580,396) (—.150) 

a * :< [3444669 - (1263483)(.531)]*« 

We know however, since the grades were coded, that it is not sensible to carry 
results to more than three places, (and, indeed, a three place determination of 
the skewness is very satisfactory for interpretive purposes even though more 
places might be obtained) so we cut down the number of places. The division 
of numerator and denominator by 10*, and the dropping of the decimals results in 


«S:.' 


-1131 + 685(—1.593) + 944(.846) + 804(-.150) 
[344 - 126(.531)]* /s 


—.335. 


It is possible of course to duplicate the theory indicated in Table II with the 
use of moments rather than the A’s. In this case Fig. A consists of 1, 2 x/N, 
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2x*/N, etc. We have such formulas as a #(f • — — m mu’- mmWh , 

where a„ = o»« lV = etc. 

It would be possible to compute the a*, in a somewhat similar fashion though 
it would take somewhat longer. In the first place we would have to compute 
SxV from the correlation table. This could be done by forming the cumula¬ 
tion C(yl) and multiplying by 1, 3, 5, 7, 9, etc. When this is done, however, 
it does not appear that the calculation of the central moments of the fourth order 
can be reduced to as simple a ritual as the calculation of the third order moments. 

The question should be raised as to the calculation of the skewness when 
there are two or more independent variables. This can be done, of course, but 
the calculations are lengthy. The point of the present paper is to provide an 
easy and simple technique for computing the skewness of residuals in the case 
of two variable linear regression. 
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NOTES 

This section is devoted to brief research and expository articles } notes on methodology 
and other short items . 


NOTE ON THE ADJUSTMENT OF OBSERVATIONS 

By Arthur J. Kavanagh 
The Forman Schools, Litchfield , Conn . 

The method of least squares has been extended to the adjustment of observa¬ 
tions with errors in more than one variable. The history of the development 
and its principal results have been given by Deming [2], [3], [4], [5]. The basis 
is the assumption that for the “best” adjustment the sum of the weighted 
squares of all the residuals (observed values minus adjusted values) must be 
made a minimum with respect to the adjustments to the observations and with 
respect to the parameters involved in the conditions the adjusted values must 
satisfy. In certain problems, such as some arising in the study of relative 
growth in biology, this assumption is not adequate; it is necessary that the 
sum to be minimized be generalized to include cross products as well as squares 
of the residuals. 

Suppose we have a set of n universes of g-dimensional points whose centers of 
gravity are known to satisfy certain conditions; for instance, they might all lie 
on a certain type of curve. A sample having been taken from each universe, 
the center of gravity of each sample is taken as the observed center of gravity 
of the corresponding universe, and it is desired to determine the most probable 
set of adjustments to the coordinates and the most probable set of parameters 
involved in the conditions, subject to the requirement that the adjusted values 
satisfy the conditions exactly. It is assumed that the sampling distribution of 
the center of gravity in each universe satisfies the multivariate normal law, and 
that the standard deviations and coefficients of correlation of each sample may 
with sufficient accuracy be taken as the constants of the corresponding universe. 
Then by reasoning analogous to that of the derivation of the least squares 
principle for one variable from the univariate normal law, the probability of 
getting the observed set of values is proportional to e~°, where 

(1) Q - E Qi 

t-1 

Qi being a homogeneous quadratic function of the errors at the ith centroid and 
in general involving the cross products as well as the squares of the errors. 

Ill 
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The probability will be a maximum when Q is a minimum. Consequently the 
best estimates for the coordinates of the centroids will be those making Q a 
minimum, subject to the conditions which the coordinates must satisfy. 

For example it may be desired to study the relation between height and weight 
among growing boys by fitting a curve to the points whose abscissa and ordinate 
are respectively average height and average weight of a particular age group, 
one point corresponding to each age group in the study. The data for such a 
study are obtained from samples of the several age groups. Then the number 
n of universes is the number of age groups being studied, each universe con¬ 
sisting of the totality of two-dimensional points obtained by pairing the height 
with the weight of each boy in the age group. The centroid or “average point ,, 
of each universe would ideally be obtained from measurements of all the in¬ 
dividuals of that age, but since sampling must be resorted to it is necessary to 
make allowances for the sampling distributions of the centroids. It is known 
that within each age group there is correlation between height and weight [1]. 
Consequently the sampling distribution of each centroid will exhibit a correla¬ 
tion which can be expressed in terms of the coefficient of correlation between 
height and weight of the individuals of the universe from which the sampling 
distribution arises. The existence of this correlation results in the presence of 
the cross-product term in the exponent of the bivariate normal formula de¬ 
scribing the sampling distribution of the average values, that is in the Qi of each 
centroid. If there were no such correlations the cross-product term in each Qi 
would vanish and the situation would reduce to that of least squares. 

In the general case, let Xi t , X 2 i , • • • , X qi be the observed coordinates of 
the ith centroid, x u , x 2i , • • • , x qi the adjusted values (to be determined), and 
V ji = Xji — Xji . Then Qi may be written 

Qi = WlliV\i + Wl 2 iV\iV 2 i + + WlqiV\iVq% 

/ON + WniVliVu + W22iV 2 i + ••• + WtqiViiVqi 

(2) 4-. 

*4“ W q \iV q %V\i "f" WqfiiVqiVii 4“ • • * 4“ WqqiVqi 


the w’ s being the weights, with w jk i = w k a . Thus in the case of two variables, 
if Ni be the number of items in the ith sample, r, its coefficient of correlation, 
and <ru , an its standard deviations, then 


Win = 


Ni 


2(1 - r\)au* 


wm = 


-NiTi 


2(1 — r\) aua 2 i 


= W21t, 


Wni = 


Ni 


2(1 — r\)a\i 


The coefficients of the cross products in Q involve the coefficients of correla¬ 
tion of distributions. If the latter are all zero the cross products vanish and Q 
reduces to the sum of weighted squares, which is the basic expression of the 
least squares procedure. Consequently, from this point of view, the least squares 
assumption is equivalent to the assumption of zero correlation between the 
errors. The procedure in the more general situation might be called “least 
quadratics”. 
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The Lagrange method of undetermined multipliers can be used to calculate 
the values of the adjustments to the coordinates and the values of the param¬ 
eters. The procedure is the same as for least squares [2], [3], [5], the only 
difference being the somewhat greater complication of the algebra. We shall 
summarize the development here. 

The condition equations, supposed v in number, may be written 

**(*: 11 Pi, V *, • ”, Pr) . - o, h - 1, 2, • • • , V, 

where each F* may in general involve any or all of the numbers x,-,- as well as 
any or all of the parameters pi , whose number we suppose to be r. Let 

(3) F)i = dF h /dXji , Fj = dF h /dpi 

where the X’s have been substituted for the x’s after differentiation, and each 
pi has been replaced by the best available approximate value pio . Let F$ be 
the value of F h after the same substitution. Also let Vi * p m — p \. Then if 
the V’a and v’s are small the conditions may be written 

(4) + = h= 

i i l 

Differentiate Q with respect to the F’s and equate the result to zero, eliminat¬ 
ing the factor 2. Differentiate (4) with respect to the F's and the v’s , multiply 
each equation by the corresponding undetermined multiplier —X* , and sum 
the results together with the result from differentiating Q. Collecting coeffi¬ 
cients of the differentials SVa and 6 v t , equating to zero and transposing the 
terms involving \ h , we get 

wmVu + WuiVu + • • * + WiqiV q i = [XaF}<] 

(5) wniVu + Wi 2 iV 2 % + • • • + VHq%V q i = [XvFji] (i = 1, 2, • • • , n) 

tOqliVli “f* Wq2%V2i • • * "f" WqqiVqi = [X^F^*] 

(6) [X^F?] = 0 Z = 1, 2, • • •, r, 

where the brackets denote summation with respect to h. 

Equations (5) can be written down easily, since the coefficients w,ki appear 
in the same order as in (2). The equations corresponding to each i form a 
complete set which can be solved independently of those for other values of i. 
The solution can be expressed 

V.ji = A in [X* F},] + Aiji [Xfc Fa*] + • • • + A 9ii [\ k F h q A 

<7) 

where Aka is (—l) fc+J times the minor corresponding to , divided by the 
principal determinant. By symmetry . 
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The F’s in (4) are to be replaced by their values from (7) and the coefficients 
of the X’s collected. To facilitate this let 


Lit “ 2 Lju 


i-l 


where 


Ljki = it ArtiFtiF^i. 

i-l r—1 

Each Ljti can be written down easily from the corresponding Qi as written in 
(2): in each term w r ,iVriV,{ replace to„< by A r$i f Vri by FU , and F.< by F*,i. 
It is important to preserve the order of the subscripts of the F’s in (2), and to 
treat the diagonal terms w„iVli as though written u) rr »F r <F r <. It is seen that 
Ljti = Lt,i, and L jk = L*,-. Then the substitution from (7) into (4) gives 

(8) 2 Ljk X; + S F; t'i — F\ h — 1,2, •v. 

i-l i-i 

Equations (8), with (6), are formally identical with those of the least squares 
procedure which are called by Deming the “general normal equations”, and 
they can be written schematically in the same manner. The further procedure 
is identical with that for least squares, involving solution of the general normal 
equations for the X’s and v’b, substitution of the values of the X’s into (7) to 
obtain the F’s, and then adjustment of the observations by use of the F’s, 
and adjustment of the provisional values of the parameters by use of the t»’s. 

A word of appreciation is due Dr. 0. W. Richards of The Spencer Lens Com¬ 
pany for calling this problem to my attention, and for encouragement in the 
carrying out of the solution. 
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THE ESTIMATION OF A QUOTIENT WHEN THE DENOMINATOR 
IS NORMALLY DISTRIBUTED 

By Robert D. Gordon 

Scrimps Institution of Oceanography, La Jolla, Calif. 

1. Introduction. In an oceanographic investigation we have to deal with a 
time series consisting of single pairs of observed values x, y, of two independent 
stochastic variables, whose true (mean) values we shall denote respectively by 
a, b. Of interest is the corresponding time series of quotients (b/a), which it 
is required to estimate from the observations x, y. Both x and y are approxi¬ 
mately normally distributed about their mean values a, b with rather large 
variances <r*, a\ which can be estimated. It is easily possible for x to vanish 
or even to be of opposite sign to a, although a cannot itself vanish. The re¬ 
quired estimates of (b/a) should have the property that they can be numerically 
integrated, i.e. that an arbitrary sum of such estimates shall equal the corre¬ 
sponding estimate of the true sum. 

Let us define a function y(x) to have the property that its mathematical ex¬ 
pectation 2?{-y(a;)} is exactly 1/a, where a = E(x). If such a function exists 
we shall have 

(1) E\y-y(x)) = E(y)-E{y(x)} = b-(l/a) = b/a 

so that y-y(x) will be an estimate of b/a which has the required property: 
namely such estimates can be added, and we have 

E\y t y(xi) + yiy(x/)\ = E{y 1 y(x l )} + E{yMx*)) = &i/<*i + &*/o* 

as required. It turns out that if x is normally distributed with non-zero mean 
such a function y(x) does exist, and is given by the formula 

(2) y(x) - I exp (*72*1) f e~ ,tl3 dt = -* R x ,. m 

<Fz J *l*» G* 

where R u is the “ratio of the area to the bounding ordinate” which is tabulated 
by J. P. Mills, 1 also in Pearson’s tables. 2 Equation (2) holds if a is positive; if 
a is negative the integration should extend over (*/*,, — »). It is easy to 
verify that 

(3) E(y(x)) - —L- f y ( x ) exp (- dx = \ 

V2 ic<r x J -" \ 2<rl / a 

by direct substitution from (2). 

1 J. P. Mills, *' ‘Table of ratio: area to bounding ordinate, for any portion of the normal 
curve/* Biometrika, Vol. 18 (1926), pp. 395-400. 

* Karl Pearson, Tables for Statisticians and Biometricians , part II, table III, Cambridge 
Univ. Press. 
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2. The law of large numbers for y(z)< The function y(z) defined by (2) has 
mean value 1/a as required, but its second moment (hence variance) does not 
exist, as may readily be verified. By a theorem of Khinchine* however, its 
values satisfy a law of large numbers. It will be of interest to inquire about the 
“strength” of this law of large numbers for y(z). Namely, given a positive 
number e, how many “observations” (independent estimates) y(z) will suffice 
to guarantee probabilities of .50, .90, .95, etc. for the following inequality to 
hold 


(4 ) tQei) + y(xj) + + y (z n ) _ 1 - < € 

n a I 

where n is the number of “observations.” 

In order to arrive at a rough answer to this question we have made use of 
certain inequalities due to Tshebysheff (TshebyshefTs “method of moments”, 
cf. Uspensky 4 ). Let u be an arbitrary stochastic variable whose distribution 
has moments of the first and second order which are known. Denote by m its 
first moment, by a its variance, then it results from TshebyshefTs theory that 
the probability P(u x , u*) for a value of u to lie between u x and u% (i.e. 

Ui) satisfies the inequality 


( 6 ) 


P(Ui, u*) > 1 - 


2 


<r 

(ui — m) 2 + a 2 


(it* — m) 2 + a 2 * 


This inequality is independent of the values, or even the existence, of further 
moments of the ^-distribution beyond the second, and depends only on the 
condition that the cumulant of the distribution function shall have at least three 
“points of increase.” 

Although 7 (x) does not have a second moment, a second moment does exist 
for those values of y(x) which correspond to x ^ — 0 > — °o, where 0 is an 
arbitrary number, positive or negative. If we can estimate the first two mo¬ 
ments of y(x) ~ l/x corresponding to a given value of 0, then for a given number 
n of observations we need only to divide the corresponding variance by n to 
obtain <r* in (5), then multiply (5) by the nth power of the (normal) probability 
for the inequality x — 0, in order to obtain a lower bound for the probability 
of the inequality (4). 0 is to be determined so as to yield a maximum result. 

The first moment mi of y(x) for values of x ^ —0 is easily computed, and is 
given by the formula 

(6) a xmi (e) = ^(l - 


a J. V. Uspensky, Introduction to Mathematical Probability , pp. 195, McGraw-Hill (1937). 
4 J. V. Uspensky, l.c. pp. 365 ff. 
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The second moment is harder to compute, but if we place 

4(0) « K-(mt - m\) » £ [?(*)]* exp —-jy" ■ ^ dx 

<7) [Z> fa) ^ (-^H 

VU'-L ** P (-^^.')«b 

where 


L- fexp(-<L=^V 

2a- «r, *L* V 2«rJ / 


we easily obtain the relationship 
1 


-1- f 

V^2ir *-<*+«)/*• 


-«*n 


dt 


<8) * ,(#) K- - 7 (‘ “ E 


*u.. \v 

<♦+«)/».// * 



From (7), using a table of the probability integral, it can be verified that 
4(—a — 3(7*) <SC 0.001. Assume, therefore, as a boundary condition 4(—a — 
3(7,) = 0 then (8) can be integrated graphically or numerically. It is by this 
means that the curves shown in Figs. 1 and 2 were determined. Computations 
were also attempted for a/<r z — §, a/«r, = 1 , but it was not possible to obtain 
significant results: it would be necessary in these cases to take more than two 
moments into account, which would lead to hopeless complications. In these 
figures the ordinates represent probabilities for an observation to fall between 
.90a and 1.11a (Figure 1), and between .76a and 1.88a (Figure 2), respectively. 
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3. Two practical formulas for computations. It seems worthwhile to note 
here two simple formulas in connection with Mills’ ratio (2) which will be useful 
for computations. The first is the obvious relationship 

(9) = \/2t e“ ,/ * - R u = 1/r - R u 

in the notation of Pearson’s tables. The second applies to large values of x, 
and may be written 

(10) -^ 4 —, < y(x) = - 1 - R x „, < 1 
x + <T X 

(10) is true for x > 0, and can be proved by means of the differential equation 
which y(x) satisfies. 

4. Remarks. The estimate y(x) has the following inadequacy: If only a single 
observation x is known, then it is unknown whether a is of like or unlike sign 
compared to x . It turns out then that the mathematical expectation for the 
value of y(x) vanishes identically. This difficulty of course disappears if more 
than one observation is available. Methods of avoiding this difficulty for time 
series, e.g. by noting relative frequencies for observations separated by 1, 2, 3 
etc. intervals to agree in sign, will be discussed elsewhere in connection with 
practical applications. 

It may be worthwhile to note that Geary 6 developed certain characteristics 
of the distribution of a quotient, which however are not adapted to our purposes. 


NOTE ON CONFIDENCE LIMITS FOR CONTINUOUS DISTRIBUTION 

FUNCTIONS 

By A. Wald* and J. Wolfowitz 

In a recent paper [1] we discussed the following problem: Let X be a stochastic 
variable with the cumulative distribution function /(.r), about which nothing is 
known except that it is continuous. Let X \, • * • , x n be n independent, random 
observations on X. The question is to give confidence limits for f(x). We 
gave a theoretical solution when the confidence set is a particularly simple and 
important one, a “belt.” 

A particularly simple and expedient way from the practical point of view is 
to construct these belts of uniform thickness ([1], p. 115, equation 50). If the 
appropriate tables, as mentioned in our paper, were available, the construction 
of confidence limits, no matter how large the size of the sample, would be im¬ 
mediate. 

Our formulas (11), (16), (19), (27) and (29) are not very practical for computa¬ 
tion, particularly when the samples are large. We have recently learned that 

s Geary, R. C., “The Frequency Distribution of a Quotient,” Jour . Roy . Stat. Soc ., 
Vol. 93 (1930), pp. 442-446. 

•"Columbia University, New York City. 
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there exists a result by Kolmogoroff [2], generalised by Smirnoff [3]/which for 
large samples gives an easy method for constructing tables, i.e. of finding a 
when c and n are given (all notations as in [1]). The result of Kolmogoroff- 
Smimoff is: 

Let c ■ \/y/n. Then for any fixed X > 0, 

lim P = lim P » 1 - e - * x * 

10 H—tt 

lim P « 1 - 2 £ (-l) <m_1) e '*"**. 

This series converges very rapidly. 

REFERENCES 

[1] Wau> and Wolfowitz, “Confidence limits for continuous distribution functions,” 

Annate of Math. Stat., Vol. 10(1930), pp. 105-118. 

[2] A. Kolmogoroff, “Sulla determinazione empirica di una leggi di distribuzione,” 

Oiornale dell'Instituto Italiana degli Attuari, Vol. 11(1933). 

[3] N. Smirnoff, “Sur les ecarts de la courbe de distribution empirique,” Recruit Malhe- 

matique (Mathematickeaki Sbornik), New series, Vol. 6(48)(1939), pp. 3-26. 

‘In the French r6sum£ of Smirnoff’s article, on page 26, due to a typographical error 
this formula is given with a factor (-1)* instead of the correct factor (-1)* -1 . The 
correct result follows from equation (112), page 23, of the Russian text when t is set equal 
to zero. 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 

The Sixth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Stevens Hotel, Chicago, Thursday to Saturday, December 26 to 28, 
1940 in conjunction with the meetings of the American Statistical Association, 
the Econometric Society, and the American Marketing Association. The fol¬ 
lowing fifty members of the Institute attended the meeting: 

H. E. Arnold, C. S. Barrett, A. G. Brooks, R. W. Burgess, A. G. Clark, A. C. Cohen, Jr., 
W. G. Cochran, A. T. Craig, C. C. Craig, B. B. Day, W. E. Deming, J. L. Doob, P. S. Dwyer, 
Churchill Eisenhart, J. W. Fertig, P. G. Fox, Hilda Geiringer, E. J. Gumbel, Myron Heid- 
ingsfield, Harold Hotelling, Leo Katz, J. F. Kenney, L. F. Knudsen, Alma Kohl, T. Koop- 
mans, D. H. Leavens, Ida Levin, G. A. Lundberg, 8. N. Lyttle, W. G. Madow, Ralph 
Mansfield, G. F. T. Mayer, J. R. Miner, E. C. Molina, C. R. Mummery, J. I. Northam, 
E. G. Olds, P. 8. Olmstead, A. L. O’Toole, J. A. Pierce, Wilhelm Reitz, P. R. Rider, M. M. 
Sandomirc, L. W. Shaw, W. A. Shewhart, F. F. Stephan, S. A. Stouffer, A. G. Swanson, 

S. S. Wilks, M. O. Woodbury. 

The opening session, on Thursday afternoon, was devoted to contributed 
papers in probability and statistical methodology. The Chairman was Professor 
S. S. Wilks of Princeton University, and the following papers were presented: 

1. On the Calculation of the Probability Integral on Non-Central t and an Application. 

C. C. Craig, University of Michigan. 

2. Effective Methods of Graduation. 

Max Sasuly, Office of the Actuary, Social Security Board. 

3. On Some New Results in the Sampling of Discrete Random Variables. 

William G. Madow, Bureau of the Census. 

4. On the Use of Inverse Probability in Sample Inspection. 

W. Edwards Deming and W. G. Madow, Bureau of the Census. 

5. On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table when 

Some of the Marginal Totals are Known. 

F. F. Stephan, Cornell University, and W. Edwards Deming, Bureau of the Census. 

6. The Return Period of Flood Flows. 

E. J. Gumbel, New School for Social Research, New York City. 

7. A Note on the Power of a Sign Test. 

W. M. Stewart, University of Wisconsin. 

8. A New Explanation of Non-Normal Dispersion. 

Hilda Geiringer, Bryn Mawr College. 

Abstracts of these papers follow this report. 

On Friday morning a session was held jointly with the American Marketing 
Association on The Theory and Application of Representative Sampling . Under 
the chairmanship of Professor Theodore H. Brown of Harvard University, the 
following papers were presented: 

1. Background and Method . 

F. F. Stephan, Cornell University. 
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2. Application to Marketing Problems. 

Archibald M. Crossley, New York City. 

3. Application to Agricultural Problems. 

Arnold J. King, Iowa State College. 

The afternoon session on Friday was held jointly with the American Statis¬ 
tical Association and Econometric Society on The Analysis of Variance . The 
chair was held by Professor P. R. Rider of Washington University and the fol¬ 
lowing papers were presented: 

1. The Relation Between the Design of an Experiment and the Analysis of Variance. 

A. E. Brandt, Soil Conservation Service. 

2. The Underlying Principles of the Analysis of Variance and Associated Tests of 
Significance. 

Churchill Eisenhart, University of Wisconsin. 

3. The Applications of the Analysis of Variance to Non-Dr thogonal Data. 

W. G. Cochran, Iowa State College. 

Discussion: 

Gertrude M. Cox, North Carolina State College. 

John F. Kenney, University of Wisconsin Extension Division. 

W. Edwards Deming, Bureau of the Census. 

On Saturday morning and afternoon, sessions were held with the American 
Statistical Association on Collection and Use of Statistics for Quality Control in 
National Defense Industries. At the morning session the following papers were 
given, with Dr. C. W. Gates of the Western Electric Company in the chair: 

1. Report on the Quality Control Program of the American Standards Association. 

John Gaillard, Western Electric Company. 

2. Sample Verification in the Administration of the Population Census. 

W. Edwards Deming, Bureau of the Census. 

3. The Importance of the Statistical Viewpoint in High Production Manufacturing. 

P. L. Alger, General Electric Company. (Read by C. Eisenhart.) 

4. On the Initiation of Statistical Methods for Quality Control in Industry. 

Leslie E. Simon, Aberdeen Proving Ground. 

At the afternoon session the following papers were presented under the chair¬ 
manship of Dr. John Johnston of the United States Steel Corporation: 

1. The Place of Statistical Analysis in Ferrous Metallurgy. 

E. M. Schrock, Jones and Laughlin Steel Corporation. 

2. Statistical Methods in the Production and Inspection of Cast Iron Pipe. 

J. T. MacKenzie, American Cast Iron Pipe Company. 

3. Applications of Statistical Methods to Metallurgy. 

R. B. Mears, Aluminum Company of America. 

Discussion: 

Churchill Eisenhart, University of Wisconsin. 

The annual business meeting of the Institute was held on Thursday afternoon 
after the session on probability and statistical methodology, with the President 
presiding. 

The Secretary-Treasurer read the financial report for 1940. 
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The Editor of the Annals of Mathematical Statistics reported on the progress 
of the Annals during 1940. It was stated that manuscripts worthy of publica¬ 
tion were now being submitted at a rate that would justify the publication of 
a 500-page annual volume. To make this amount of publication self-supporting 
upon the expiration of the Rockefeller grant in June, 1941, it was pointed out 
that another 150 new subscriptions would have to be obtained during 1941. 
Judging from the rate at which subscriptions had been coming in during the 
past two years such an increase was considered entirely feasible with the coopera¬ 
tion of the members of the Institute. Various methods of effecting this increase 
were discussed at the meeting and suggested for the consideration of the Board 
of Directors. 

On behalf of the Board of Directors the President made the following report: 

1. The Report of the War Preparedness Committee, approved in preliminary 
form at the Hanover meeting, had been preprinted and some of the preprints 
had already been distributed. 

2. Arrangements had been made with the Executive Officer of the National 
Roster of Scientific and Specialized Personnel to send the statistics check list 
to all members of the Institute who are not members of the American Statistical 
Association. 

3. That preprints of the pamphlet on The Teaching of Statistics, including an 
address by Professor Harold Hotelling, discussion by Dr. W. E. Deming and 
the resolutions on the teaching of statistics adopted by the Institute at the 
Dartmouth meeting had been produced and distributed. 

4. That application 1 had been made to the Executive Committee of the Ameri¬ 
can Association for the Advancement of Science through the Permanent Secre¬ 
tary for admission to the status of an affiliated society in the Association. 

It was announced that through the annual election, carried out by mail ballot, 
the following officers were elected for 1941 (all names being those proposed by 
the Nominating Committee): 

President: Professor Harold Hotelling 

Vice-Presidents: Professor A. T. Craig 
Professor H. C. Carver 

Secretary-Treasurer: Professor E. G. Olds 

The annual luncheon was held at noon on Friday with the President-Elect 
presiding. Short talks were made by Dr. E. J. Gumbel, Dr. T. Koopmans and 
Professor S. S. Wilks, while the annual luncheon address was delivered by 
Professor P. R. Rider. 

P. R. Rider, 
Secretary-Treasurer. 

1 This application was approved by the Executive Committee of the A.A.A.S. at its 
Deoember 1940 meeting. 
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(Presented on December 26, 1940, at the Chicago meeting of the Institute) 

On the Calculation of the Probability Integral on Non-Central t and an Appli¬ 
cation. C. C. Craig, University of Michigan. 

It seems not to have been noted that the probability integral for non-central t can be 
calculated by means of an infinite series in incomplete 0-functions which converges rapidly 
for small samples. The application here considered is to a test based on the randomization 
principle which is the subject of E. J. G. Pitman’s paper: Significance tests which may be 
applied to samples from any populations {Roy. Stat. Soc. Jour., Vol. 4 (1937), pp. 119-130). 
In case the samples come from normal populations with equal variance but with unequal 
means, the chance that the hypothesis of equal population means will be accepted on this 
test is given by this probability integral which is evaluated in some illustrative numerical 
examples. 

On Some New Results in the Sampling of Discrete Random Variables. Wil¬ 
liam G. Madow, Bureau of the Census. 

Many statistical tables may be regarded as the result of subsampling finite populations 
classified into r X s X • • • tables. The main aim of this paper is to derive the associated 
statistical theory including both the finite and limiting distributions. After evaluating 
the fundamental distributions and the moments it is shown that under certain conditions, 
the limiting distribution is multinomial, while under other conditions the limiting distribu¬ 
tion is multivariate normal. These results are then applied to determine the adequate size 
of sample, and the sampling proportions from various strata. 

On the Use of Inverse Probability in Sample Inspection. W. Edwards Dem- 
ing and William G. Madow, Bureau of the Census. 

The theory of inspection by sampling is abstractly equivalent to one part of the theory 
of subsampling. The theory of subsampling finite populations is considered in this paper 
in order to investigate the differences that occur when the methods of fiducial inference and 
inverse probability are used, particularly in regard to determining the adequate size of 
sample. In sample inspection, the prior distribution of failures is almost always known, 
at least approximately. In using any system of sample inspection, a number of failures will 
pass undetected. On the basis of certain prior distributions of failures, distributions are 
derived for the number and percent of failures remaining after each of several different 
possible systems of sample inspection has been applied. Formulas giving the cost of partial 
inspection are used together with these distributions in order to determine methods of 
sample inspection having various desired properties. 

On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table 
When Some of the Marginal Totals are Known. Frederick F. Stephan, 
Cornell University and W. Edwards Deming, Bureau of the Census. 

The 5 per cent sample taken with the 1940 Population Census presents an interesting 
problem of estimation in which the estimates are connected by equations of condition. 
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These equations arise from the fact that certain sums of estimates derived from the sample 
should equal the corresponding frequencies derived from the tabulations of the census 
enumeration, i.e. the distribution of each of several variables may be known but their 
joint distribution may only be estimated from a cross tabulation of the data furnished by 
the sample. The adjustment of the sample estimates is accomplished by the principle of 
least squares and an outline of the various types of conditions for two and three variables 
is presented. The solution of the normal and condition equations is tedious when hundreds 
of sets of estimates must be adjusted but a simple iterative procedure is available (see 
Annals of Math. Stat ., Vol. 11 (1940), pp. 427-444). 

The Return Period of Flood Flows. E. J. Gumbel, New School for Social 
Research (N. Y.) 

For any statistical variable the return period is defined as the mean number of trials 
necessary in order that a certain value of the variable or a greater one returns. The return 
period is a theoretical statistical function such as the distribution or the probability. In 
hydraulics the corresponding observed values are the recurrence and exceedance intervals. 

The main thesis is that the flood flows are the largest values of flows which have to be con¬ 
sidered as unlimited variables. The method of return periods applied to the largest values 
leads without further assumptions to a formula which gives the return period f(x) of a flood 
superior to x t and at the same time the most probable flood to be reached not at a certain 
time, but within a certain period. This formula contains only two constants, which are 
linear functions of the mean annual flood and the standard deviation. Fuller’s formula 
turns out to be an asymptotic expression of my formula. 

This method applied to the Connecticut, Columbia, Merrimack, Cumberland, Tennessee 
and Mississippi rivers shows a very good fit between theory and observation, superior to 
the methods applied heretofore. 

A Note on the Power of the Sign Test. W. M. Stewart, University of Wis¬ 
consin. 

Let us consider a set of N non-zero differences, of which x are positive and N — x are 
negative; and suppose that the hypothesis tested, Ho , implies in independent sampling 
that x will be distributed about an expected value of N/2 in accordance with the binomial 
(i + i) N * As a quick test of J/ 0 , we may choose to test the hypothesis h 0 that x has the 
above probability distribution. Defining r to be the smaller of x and N — x, the test con¬ 
sists in rejecting ho and therefore Ho whenever r < r(e, N), where r(«, AD is determined by 
N and the significance level e. 

In applying such a test it is of interest to know how frequently it will lead to a rejection 
of Ho when Ho is false and the actual situation H implies that the probability law of x is 
(q -f p) N , with p 5 * J, thereby indicating an expectation of an unequal number of + and — 
differences. The probability of rejecting Ho when Hi implying p * pi is true, is termed the 
power of the test of Ho relative to the alternative H\ . 

A table is given for the 5% significance level (« — .05) showing the minimum value of 
N for which the power of the test relative to p — pi exceeds 0 for values of 0 from .05 to .95 
at intervals of .05; and for pi from .60 to .95 (and thereby for pi from .40 to .05) at intervals 
of .05. The case of 0 > .99 is also considered for these values of pi . 

A New Explanation of Non-Normal Dispersion. Hilda Geirinoer, Bryn 
Mawr College. 

The starting point of the Lexis theory consists in this fact: It is to be expected, on the 
average, that two expressions 2 and S' which can be computed from the results of m*n 
observations are equal, provided that the corresponding wn chance variables x are 
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equally and independently distributed. Let a, be the average a* - 1/n z„ and a the av- 
erage of the a„ (p - 1, • * • , m). Then 

2]£ (v - «)* 

y> m m f#i _ ”* o > _ 

" ron — 1 mn — 1 mn 

2 ^ _ 

y _2L Ji_. 

" m — 1 m — 1 m 

We see, however, that rows and columns do not play the same role here because Z depends 
only on the a*. , the average values of the rows. If the observed value of Z happens to be 
larger (smaller) than the value of Z ', we speak of supernormal (subnormal) dispersion. 
It is well known that supernormal dispersion can be explained by assuming that the m*n 
theoretical populations are only equal “by rows" but not by columns (there are m different 
distributions); in the same way one can explain the case of subnormal dispersion by admit¬ 
ting that the distributions are equal “by columns,” but not by rows. 

Another explanation which may sometimes seem more plausible is the following: All 
the m-w distributions are supposed to be equal, but we omit the assumption of mutual in¬ 
dependence. Then one can prove that the supernormal or subnormal dispersion corresponds 
respectively to an appropriately defined “positive” or “negative correlation.” The fact 
that normal dispersion occurs rather rarely in social questions is then reflected by the idea 
that social phenomena are in fact not independent of each other but are usually only as¬ 
sumed so for the purpose of simplicity. In that way the more frequent occurrence of 
supernormal dispersion likewise finds an adequate explanation. 




THE CYCLIC EFFECTS OF LINEAR GRADUATIONS PERSISTING IN 
THE DIFFERENCES OF THE GRADUATED VALUES 

By Edward L. Dodd 
University of Texas 

1. Scope of inquiry. Slutzky [ 1 ] applied the moving sum, the repeated 
moving sum, and other linear processes to random numbers obtained from 
lottery drawings. But the graph of the moving sum becomes, when the vertical 
scale is changed in the ratio of n to 1 , the graph of the moving average , the simplest 
form of graduation . When cyclic effects are studied, there is no essential differ¬ 
ence between a moving sum and a moving average, nor between a general linear 
process with coefficients a x , a% , • • • , a,, having sum A 5 * 0 and the corre¬ 
sponding graduation , with coefficients a[ — ai/A . Thus Slutzky’s work throws 
considerable light upon graduation, although his main interest was in summation. 

Slutzky found that the graphs of moving sums of random numbers bore 
strong resemblance to graphs of economic phenomena, such as [ 1 , p. 110 ] that 
of English business cycles from 1855 to 1877. In fact, Slutzky regards the 
fluctuations in economic phenomena as due largely to a synthesizing of random 
causes. 

In general the undulatory character of such values cannot be described as 
periodic; since the waves are of different length. But Slutzky found that, upon 
operating on random data having mean zero and constant variance, the resulting 
values approach a sinusoidal limit under certain conditions,—in particular, when 
a set of n summations by twos is followed by m differencings, and as n —> 00 , 
m/n —► a constant. Romanovsky [2] generalized this result by taking successive 
summations of s consecutive elements of the data, with s 2; but required that 
m/n —► a ^ 1. However* the cases which are of interest to me just now are 
those for which m = n — 1 or ra = n — 2; and for these cases m/n —► 1 . Ro¬ 
manovsky considers the case of m = n - 1 ,—not, however, as leading to a 
sinusoidal limit,—and gives in formula (46) the value of a coefficient of correla¬ 
tion—which I deduce directly. From his formula (43) a corresponding coeffi¬ 
cient of correlation can be obtained for the case of m — n — 2 , as the sum of 
certain products. A more simple expression than this I need, which I obtain 
directly. In my treatment, these coefficients are the cosines of angles; and the 
ratio of such an angle to a whole revolution is an expected frequency of 
occurrence. 

After setting forth in Section 2 some preliminary formulas, I treat in Section 3 
the results of applying to random data an indefinite number k + 2 of summa¬ 
tions or averagings, followed by k differencings—the number of terms in a sum 
remaining fixed. In Section 4, however, only a few differencings are applied to a 
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graduation. In particular the Spencer 21-term formula is studied in some 
detail. In former papers [3, 4] I have dealt with the immediate effects of 
graduations upon random data. 

The question to be considered in this paper is this: Do the cyclic effects appear¬ 
ing in the graduated values persist in the successive differences1 And , if so, do 
these affects fade out gradually or on the other hand , do they come to a rather abrupt 
terminationt 

These differences of graduated values, indeed, up to the third, fourth or fifth 
are of considerable importance. Henderson [5] defines the smoothing coefficient 
of a given graduation as the ratio of the theoretical standard deviation of the 
third differences for the graduated values to that for the original values or data. 

2. Preliminary notions and formulas. The data to be graduated will be sup¬ 
posed to be independent, or uncorrelated, or as Slutzky expresses it, “inco¬ 
herent.” This will imply that the expected value of the product of two different 
chance variates is the product of their expected values. 

Now the operations of summing and differencing as used hen* are not inverse. 
To illustrate: Given as independent u , v, w, x, y, z, • • •. Summing by twos 
yields the sequence u + v, v + w, w + x, x + y, y + z, • • • . But the first 
differences of these numbers, w — u, x — v, y — w, z — x, • • • are alternately 
correlated, thus w — u is negatively correlated with y — w; x — v with z — x, 
etc. Indeed, successive differencing following successive summing does not lead 
back to the original condition of incoherency. However, under certain condi¬ 
tions, the resulting coherency may be so slight that the final succession of num¬ 
bers may have just about the same chaotic properties as the succession of data. 

In my paper [3, p. 262], I set forth a number of features on the basis of which 
a cycle length could be defined. One of these involves the frequency of maxima. 
Given independent chance variables, each subject to the same law of distri¬ 
bution, 

(1) P(x,- gi) = $(x); 

where $(#) has a derivative <t>(x). It is then easy to see that the expected rela¬ 
tive frequency of maxima is 1/3. That is: 

(2) P(xt-i g Xi £ x<+i) = f [$(x)] s 4>(:e) dx = 1/3. 

Now , for a given feature , a cycle length is defined as the reciprocal of the theoretic 
relative frequency . Then the cycle length here for maxima is three. It is well 
known that averaging tends to remove maxima. Thus, upon averaging or 
summing, the cycle length tends to increase. It is almost as well known that 
differencing tends to increase the frequency of maxima, and thus decrease cycle 
length. For if Zi = At/, = y,+i — yi , then between two maxima of , there is 
at least one minimum (strong and weak) of yi ; and following this minimum and 
before passing the next maximum of yi there is at least one maximum of Suc¬ 
cessive differencing tends to reduce the cycle length of maxima from 3 to 2, 
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that is to make the graph a perfect zig-zag where positive and negative values 
of z, alternate. A set of differencings following a set of summings may bring 
the cycle length from some fairly large number back to about 3, and thus restore 
something like the original chaotic appearance in the graph. 

In dealing with the foregoing #(x) or in (2), it was not assumed that the 
distribution be normal. But, in what follows, it will be assumed that 


(3) 


<*>(*) = 


1 


—<*— . 


<r(2v) 112 


and, for convenience, n will be taken as zero—that is, the data will be supposed 
given as deviations from their theoretic mean. Actually, the data used by 
Slutzky and the data I have used belong to a rectangular distribution, as noted 
in my former paper. Nevertheless the close agreement between actual and ex¬ 
pected results seems to indicate [3, p. 263] that the theory is in general applicable. 
It is well known that averaging of observations from non-normal distributions 
may lead rather quickly to an approximately normal distribution. 

Given n real numbers, «i, a 2 , • • • , a n , let 


(4) yi = a\Xi + <hZi+i + • • ■ + a n Xi+n-i; i = 1, 2, 3,- 

Then y 3 is the moving sum if each a r = 1. Slutzky takes j = i or j = i + n — 1. 
Again, y 3 is the moving average if each a r = 1/n. For graduation in general, 
the condition Xa r = 1 is imposed; and usually j = i + (n + l)/2. If n is odd, 
y 3 is thus associated with the middle x. 

Under the assumption that the x’s are independent and normally distributed 
about mean zero, with constant variance, I have proven [3, p. 256]: The proba¬ 
bility that for any specified j , t/;_i < 0, and y, > 0 is given by P = 0/360°, 
where 


(5) 


cos 0 


n—1 

= YI ttrttr+l 

r—1 


/ r—n 

£ 

r—1 



The expected relative frequencj r of up-crossings of the graph of the y’ s through 
the zero base line is then 0/360°. That is: 0/360° is the expected relative fre¬ 
quency of a change in the sign of y from — to +; also, of a change in sign 
from + to —. 

But, as Ay } = y,-+i — y 3 , it follows that 
(6) Ay 3 = biXi + biXi+i + • • • + b n Xi+ n ~i + b n +ix%+ n , 

where 


(7) b x = -a l , 6 n+ i = dn , b r = a r _i - a f , r = 2, 3, * • • , n — 1 

and since a maximum for the y 1 s at y* occurs when > 0, A< 0, it follows 
that the theoretic frequency therefor is 0'/36O°, where 

n / n+1 

- 

r-1 / r-1 


( 8 ) 


cos e' 
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In a similar manner, by using second differences, we get the expected relative 
frequency 0"/36O° for inflexional points, in specified direction. Moreover, 
0 0' £* 0" S • • • ^ 180°; since inflections must be at least as frequent as 

maxima, etc. 

If the foregoing formulas are applied to the identical “graduation” y,- = Xj , 
then cos 0 = 0, cos 0' = —1/2, cos 0” = —2/3. In fact, 

(9) cos 0 (<> = —t/(t + 1). 

This follows from the fact that the V s and similar coefficients are the binomial 
coefficients; and 

(10) Z t C\ - ; z «C r .,C r+1 = *C,_i. 

r—0 r—0 

Thus repeated differencing leads toward the perfect zig-zag. An extension of 
this feature will be taken up in the next section. 

3. Repeated summing and differencing. To indicate the result of the sum¬ 
ming of n consecutive numbers in a sequence, I shall use the notation l w . And 
the difference Ay,- = y M — y, will be indicated by — 1, 0 W_1 , 1. Thus if n = 3, 
l 3 and —1, 0 2 , 1 will stand respectively for 

(11) y { = Xt-i + Xi + Xi +1 ; Ay t = — a\-_i + Oar,- + 0ar 4+1 + a\- +2 . 

If, now, Zi = pi-i + y, + y,+i, then 

(12) Zi = Xi-2 + 2x,_i + 3ar t - + 2x,+i + Xi +%. 

Since (n) is often used to indicate the operation of summing n consecutive num¬ 
bers, we may write 

(13) (3) 2 = 1, 2, 3, 2, 1; (n) 2 - 1, 2, . •. , (n - 1), n, (» - 1), • •. , 2, 1. 

Then, for n > 2, 

(14) A(n) 2 = -1”, 1 M ; A 2 (n) 2 = 1, 0 n_1 , -2, O’- 1 , 1. 

And, since the operations of summing and differencing are commutative, we 
are lead to 

(15) Fi - (—l)*A*(n)* = *C 0 , O’- 1 , - k Ci , O’- 1 , *0,, O’- 1 , • • • , (-1)**C* ; 

as may be established by induction. For from the foregoing, it follows that 
(10) (—l)*A*(«) t+1 = *Co, -*C? , • • • , (~l)**C* n . 

Then, since k+iC, = *C, + *0,-1, we conclude that 

(17) Fi +l = (—l)* +1 (n) t+1 = *+,0? , O’- 1 , -w-iCr , O’- 1 , , (-l)* +1 *+i Ci +1 . 

If now n ^ 2, then from (5) and (15) we find that 

(18) . cos 0 — 0; 9/360° = 1/4. 
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Thus, the expected frequency of the changes in sign of A*(»)* is the same as 
that for the raw or ungraduated data. Moreover, if » £ 3, ( 8 ) leads to cos O' 
= —1/2, found for the data. For, in this case, at least two zero coefficients 
intervene between any two non-zero coefficients. And thus 

(19) cos O' = - £ * / 2 E - -1/2. 

In fact, the same factor cancels from numerator and denominator as we take 
higher differences, if a sufficient number of zeros intervene. More explicitly 
stated, the formula (9) found for the data is valid also for A k (ri) k , provided 
n 5 ^ t H- 2 . 

To make this more concrete, it may be noted that cycle lengths corresponding 
to t =» 0 , 1 , 2, 3, and 4, are respectively 

(20) 4, 3, 2.73, 2.60, 2.52. 

From (15), we see directly that an element of A* (ft)* is correlated only with 
certain other elements which are at distances from it which are multiples of n. 

Some of the foregoing results may be included in a theorem as follows: 
Theorem: Given a sequence of independent chance variates, each subject to the. 
normal distribution (3) with mean zero. Upon this material , let k summings or 
averagings by n be performed and k differencing s, in any order. Then the resulting 
sequence has something of the same chaotic nature as the data. In particular for 
n ^ 2 the. expected frequency of changes of sign is the same } — viz., 1/4 for change 
from minus to plus and 1/4 for change from plus to minus. Moreover , as n is 
increased from 2 to 3, 4, 5, • • • , the expected frequency of other characteristics 
becomes the same , maxima and minima , points of inflection , etc., in accordance 
with (9). 

But, suppose now that after k + 1 summings by n, only k differencings are per¬ 
formed. Is the resulting sequence almost chaotic? Hardly so. At least, it 
can be shown that changes of sign in each direction have no longer an expected 
frequency fixed at 1/4; but this expected frequency decreases as n increases. 
To show this, formula (5) Is applied to (16); and setting in ( 10 ), C = 2 *C*, 
C f = nCk-i it follows that 

( 21 ) cos 6 = [(ft — 1 )C — C']/ftC = 1 — (2k + 1 )/n(k + 1 ). 

Then cos 6 > 1 — 2 /ft; and the cycle length for expected changes of sign in 
definite direction is somewhat greater than that obtained by setting cos 0 — 
1 — 2/ft. For values of n not too small, we may write cos 6 = 1 — 0*/2, ap¬ 
proximately; and then approximately 

( 22 ) cycle length for definite change of sign in A k (n) k + l is ry/ ft. 

If ft = 9, this approximate length is 9.4, assuming k fairly large, whereas the 
more exact length is 9.2. 

Consider now the result of summing k + 2 times, and then differencing only k 
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times. For this purpose, a few formulas for summing squares will be useful. 
By the method of differences it can be shown that if l = a + and 

(23) T — a 1 j 2 + (a h) 2 + (a + 2 h) 2 4" • • • 4“ (a 4~ n — 1A) 2 4“ Z*/2, 
then 


(24) T = n(c? 4 - cd 4~ Z“)/ 3 4" (Z — cl) 2 /fin. 

Suppose, now, that a/n takes on the values 0, k C 0 , — *Ci, • • • , ( —1)**C* in 
succession, while Z/n takes on the values *C 0 , — *Ci, • • • , (—l)***?* , 0. Let U 
be the sum of the ( k + 1) values of T thus obtained. Then by (10). 

A-H 

(25) U - n 8 (2 2 *C* - 2 kC k -i )/3 4- n £ * +1 C?/6. 

t —o 


(26) 


_ n 3 (A; 4- 2) (2fc)! n - 
3 T!(F+1)! + 6 2 * 4 * C * +1 


Now, by applying to (16) one more summation by n, there are formed (A: 4- 2) 
arithmetic progressions of (n + 1) terms each, alternately increasing and de¬ 
creasing. The maximum and minimum terms at the juncture of the progressions 
are to be split into two halves to apply (23). Then the sum of the squares of 
these coefficients is given by (26). This forms a denominator for (5). 

To obtain the numerator for (5) we note that from ah = [a 2 4“ b 2 — (a — 6) 2 ]/2 
it follows that if 


(27) V = a(a 4* h) 4“ (« 4* h)(a. + 2h) + ■ • • + (a 4- n — \h)(a 4- nh ); 
then, from (23), 

(28) V = T - nh 2 / 3 = T - (Z - a) 2 /3n. 

If now IF is the sum of such F’s, reference to the last terms of (24) and (26) 
shows that 


(29) 

And hence, from (5), 

(30) 

Then 

(31) 


W - V - (n/3) 2M C k+ i 


(k 4- 2)n 2 - 4fc - 2 
C0S (fc + 2)^ 4 : 2fc 4- 1 * 


cos 6 > 


n l - 4. 
n 2 4- 2 J 


but only slightly greater when k is large. Again 

(32) cos 6 > 1 - 6/n; 

but only slightly greater when n is not small. In this case, cos 0 — 1 — 6 l /2, 
approximately. And thus, approximately, for largo k, and for n not small 

(33) cycle length for definite change of sign of A k (n ) k+2 = 1.81 n. 

This gives for n = 10 a cycle length of 18.1; whereas, if cos $ is taken as the 
right member of (31), the cycle length is 18.2. 

Thus, if a (k + 2)-fold summation or averaging of random data is followed 
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by only k differencings, the resulting graduation or linear processing z » A fc (n) fc ^ 1 
is decidedly not as chaotic as the data; as seen from (31) and (33). But further, 
A z = A h+1 (n) M ; and thus from (22) the cycle length for the expected maxima 
of z is about T\/n. 

Now Slutzky [1, p. 109] distinguished conspicuous waves from inconsequential 
“ripples.” On this basis, the frequency of significant cyclical features for a 
chance variable, such as z , would be less than the frequency of the maxima. It 
is not so clear that the frequency of significant features of a chance variable 
will be greater than that for changes of sign in definite direction. That turned 
out to be true for graduated values such as discussed in my earlier paper 
[3, p. 262]. If this be also valid for z , we would expect that conspicuous “waves” 
of A k (n) k+2 would have average length between ir\/n and 1.81n, except for small 
values of n and k. 

4. Graduations or linear processes and their successive differences. If double 
summation by n is followed by a single differencing, the result—as indicated in 
(14)—is, for n = 3, 

(34) y 3 = — Xi - *i +1 - Xi+ 2 + Xi +3 + z,+4 + Xi+b • 

Then 

(35) 2//4-S == *£*-+-3 *^*+4 »r*4-6 “H £i-H$ H” •Ti-f-7 “f" #i+8 • 

Thus pj and are negatively correlated; since , £*+ 4 , and x t+6 appear 
in each, but with sign changed. This would seem to tend to make maxima 
alternate? with minima at distances of about 3; or at distances of n, in the general 
case (14). Here, following Slutzky and Romanovsky, the coefficient of correla¬ 
tion r p between elements at a distance of p is taken as 

(36) r p = E(Xr-X r +p)/E(Xr) 2 . 

Using computed averages, instead of expected ^values, Alter [6] recommends 
a “correlation periodogram,” in which r p is the ordinate for abscissa p. 

Moreover, we would expect a graduation (4) with coefficients a* proportional 
to the ordinates y of the sinusoid y = sin (a + 2 rx/p) taken for x = 1, 2, 3, • • • 
to impress upon random data oscillations with maxima separated from minima 
by about p/2. But such a,, as well as those in (34), have abrupt endings which 
introduce noticeable alterations. More satisfactory results come from tapering 
ends, such as appear in damped vibration, with coefficients about proportional 
to e~ c| * 1 cos 2irx/p or to c"* 61 * 1 sin 2 wx/p. H. Labrouste and Mrs. Labrouste [7] 
give a powerful operator of this description. 

Slutzky (loc. cit. pp. 119-123), Yule [8], and Walker [9] make use of damped 
harmonic vibration to explain the creation of cycles; while Bartels [10] ap¬ 
proaches by a different method the oscillations that do not last. 

Now the common graduation formulas have coefficients not conforming strictly 
to damped vibration, as the tapering ends vibrate more quickly. However, 
these ends have little more than a smoothing or stabilizing effect. Furthermore, 
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the coefficients for first differences are likely to conform to something like 
sin 2icx/p. Some experimental evidence will be presented for the following 
conclusion: 

If the coefficients a, of a graduation or linear process (4) appear to conform 
roughly to equidistant ordinates of a damped vibration } ieT* 1 * 1 cos 2rx/p or 
db(T c|xl sin 2wx/p, with changes of sign at intervals of p/2, then when this process 
(4) is applied to independent chance data having zero mean and constant variance , 
there is a tendency for the graduated or processed values to change sign at intervals 
of about p/2 . 

A number of standard graduations have first and second differences—see (6), 
(7)—which bear a decided resemblance to damped vibrations, while the third or 
fourth differences have only moderate, if any, cyclic appearance. This is espe¬ 
cially true of those graduations which are constructed by applying three sum¬ 
mings—the number of terms in a sum being in general different—and a fourth 


TABLE I 


Coefficients ( X350 ) for Spencer 21-term graduation and for first four differences. 
Also theoretical cycle lengths for change in sign in values obtained from 

random data 


Grad. 
1§t D. 
2 nd D. 


3 rd D. 


4 th D. 


Cycle 

Length 

+ (5,18,33,47,57,60,57,47,33,18,6 

-1,3,5,572 ~~ 2,5,5,3,1 1U/ 

+ 1,2,2,0 3,10,14,15,12,8,3 

3,8,12,15714,10,3 ' 0,2,2,1 

+ 2 , 3 , 5 , 4,3 3 , 4 , 5 , 3,2 

— 1 , 1,6 . 1 , 477 , 6 , 7 , 4,1 0 , 1,1 . 

+ 1,0 1 , 1 , 4 , 3,3 1 2 , 1 , 2,1 

- ~ 1 , 2 , 1 , 2 , “ " i " 3 , 3 , 4 , 1,1 0,1 .. 6 ‘ 

+ 1,1,1 10 14 4 10 1 1,1,1 

-1 1 3 3 0 2 0 3 3 1 1 ,b 


process with negative coefficients. This is, indeed, a favorite form of gradua¬ 
tion, with which are associated the names of Woolhouse, Spencer, Higham, 
Kenchington, Henderson, etc. The Spencer 21-term formula, for which some 
features have already been described, [3, p. 262], will now be examined, with 
special reference to its differences. Cycle length for change of sign is one-half 
that for change from minus to plus. 

In the graduation formula, itself, there are 11 positive coefficients, centrally 
located, and relatively large as compared with the negative coefficients. This 
11 is close to 10.7 the theoretical cycle length for changes of sign of y r — 4.5, 
the difference between the graduated value y r and its mean—the arithmetic 
mean of 1, 2, • •. , 9. The structure of the first and second differences also 
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matches closely the corresponding cycle lengths. In the third differences, there 
is a break at the center; but still there appears considerable regularity. But 
among fourth differences, the zigzag is the prominent feature. Now the theorem 
of Section 3 does not really apply to the Spencer formula, with its two summa¬ 
tions by fives and one summation by sevens, and another process. But it is not 
surprising that the cyclicity ceases after passing the third differences. 

As a basis for comparing observed values with expected values, the tenth 
digits in the 600 logarithms from log 200 to log 799 were taken as a random set 
of numbers. These 600 numbers had been given a Spencer 21-term graduation 
[3, pp. 261-262], yielding 580 graduated values. From these the 579 first differ¬ 
ences were found, the 578 second differences, etc. These numbers, 580,579, • • • , 
were multiplied respectively by the expected relative frequences of change in 
sign of y r — 4.5, of A y r , t?y T , etc., as found by use of (5), (8), and similar ex¬ 
pressions to form the following table. 

The most abrupt change in frequency or cycle length appears to occur in 
passing from third to fourth differences. In Table 1, this is seen in the configura- 


TABLE II 


Comparison of expected changes of sign with observed changes for a Spencer 81- 

term graduation 


Graduated values—4.5 

First differences. 

Second differences. 

Third differences. 

Fourth differences. 


Expected Number of 
Changes from — to + 


27.2 

41.3 
52.9 

90.4 
176.7 


Observed Number of 
Changes from — to + 


27 

42 

48 

74 

146 


tion of positive and negative terms, and in the drop from 3.2 to 1.6 in cycle 
length; and in Table II in the corresponding increase in expected sign changes 
from 90.4 to 176.7. More spectacular is the increase in the number of zig¬ 
zags represented by —, +, —, +. Among the third differences, there were 
found only 13 instances of four successive terms with signs as just indicated, 
whereas among fourth differences there were found 75 such instances. For 
random material, about 36 such zigzags would be expected—decidedly more than 
found among the third difference, and decidedly less than found among the 
fourth differences. 

The Spencer 21-term graduation appears to be fairly representative of com¬ 
monly used graduations as regards regularity or irregularity in the distribution 
of positive and negative coefficients among the differences. For graduations 
with a much larger number of terms, the alternation of sign in fourth differ¬ 
ences may not be so rapid, as, e.g. in the 35-term 5th degree parabolic gradua¬ 
tion which Macaulay [11] calls No. 18. On the other hand, for a formula with 
non-tapering ends, such as the 13-term formula which Macaulay gives [11, 
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p. 64], the coefficients appearing in the differences are more irregular, especially 
at the ends. While the Spencer formula is fairly representative, different for¬ 
mulas have distinguishing features. If it is desirable to form an idea of what a 
given formula will do to random data, a table like Table I can be constructed. 

5. Summary. When upon independent chance data, summing, averaging or 
some more general graduation process is used, the graduated values tend to 
assume a wavy configuration. These waves often seem to have a fair amount 
of regularity or cyclicity. The first differences usually, and often other differ¬ 
ences of the graduated values, are decidedly cyclic. But, as we go in turn to 
the higher differences, the cyclicity may weaken. Indeed there may be a return 
to something like randomness. And subsequent differencings may tend to set 
up zigzags. 

If (k + 2) successive summings by n have been performed on independent 
chance data, with n not too small, say n ^ 5 -then k + 2 differencings will 
just about bring back the original chaotic or random condition. But with only 
k or (k + 1) differencings, a definite cyclicity remains, at least theoretically, in 
the expected values. 

In the case of the Spencer 21-term graduation, the coefficients for the suc¬ 
cessive differences indicate the appearance of cyclicity in first, second, and third 
differences. 
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ON THE DISTRIBUTION OF WILKS’ STATISTIC FOR TESTING THE 
INDEPENDENCE OF SEVERAL GROUPS OF VARIATES 

By A. Wald 1 and R. J. Brookner 1 

Columbia University 

1. Introduction. We consider p variates Xi , x 2 , • • • , x p which have a jtiiht 
normal distribution. Let the variates be divided into k groups; group one con¬ 
taining Xi f Xi f • • • , x Pl , group two containing x Pl +i, x Pl + 2 , • • ■ , x Pi , etc. We 
are interested in testing the hypothesis that the set of all population correlation 
coefficients between any two variates which belong to different groups is zero. 

Wilks 2 has derived, by using the Ncyman-Pearson likelihood ratio criterion, a 
statistic based on N independent observations on each variate with which one 
may test this hypothesis. Let ||r,-y|| be the matrix of sample correlation 
coefficients; Wilks’ statistic, X, is the ratio of the determinant of the p-rowed 
matrix of sample correlations to the product of the pi-rowed determinant of 
correlations of the variates of group one, the (p 2 — pi)-rowed determinant of 
correlations of the* second group, etc. That is 

kd 

1 | • | r a%fit | ... | r ah fi k | 

where | r ai p i | is the principal minor of | r, 7 | corresponding to the ith group. 

In order to use the test, the distribution function of X must be known. Wilks 
has shown that in certain cases the exact distribution is a simple elementary 
function; in other cases it is an elementary function, but one which is rather 
unwieldy and which does not lend itself readily to practical use. It is our 
purpose in this paper (1) to show a method by which the exact distribution can 
be explicitly given as an elementary function for a certain class of groupings of 
the variates, and (2) to give an expansion of the exact cumulative distribution 
function in an infinite series which is applicable to any grouping. 

2. The exact distribution of X. By the method to be described, the exact 
distribution of X can be found when the numbers of variates in the groups are 
such that there are an odd number in at most one group. If the number of 
variates is small, say at most eight, the method will increase only slightly the 
list of distribution functions that Wilks gives in his paper. 

1 Research under a grant-in-aid of the Carnegie Corporation of New York. 

* S. S. Wilks, “On the independence of k sets of normally distributed statistical vari¬ 
ables, ” Econometrica , Vol. 3 (1935), pp. 309 326. Other references to Wilks in this paper 
except where otherwise noted are to this publication. 
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For purposes of deriving the distribution of X we may assume that E(x % ) «= 
0, (u = 1, 2, •• • , p); that there are n = N — 1 independent observations 
x ua (a = 1, 2, • • • , n) on each variate x u ; and that the sample covariance 

n 

between x* and x ; * is given by = 23 We define w' (a function of w) 

«—i 

to be the total number of variables in all the groups which precede the group in 
which x u lies. The complete theory is independent of the ordering of the groups 
and of the ordering of the variates within the groups; hence without loss of 
generality, we may assume that if any group contains an odd number of variates, 
it will be the last group, hence u' is always an even integer. 

p 

Wilks has shown that X is a product n z u where each z u is distributed 
independently of the others, and that the distribution of z u is 


( 1 ) 


zl (n ~ v ~ l) (l - z u ) Hu '- 2) 
mn-u+ i), u'/ 2] w * 


Now let y u = log z v , then the characteristic function of y u is 


0m(O 


'i rr;;/75i L e z « v ~ z «) dz » 


B[£(n — u + 1), u'/ 2] Jo 


where i is a pure imaginary. It is known 8 that this integral, even with complex 
exponents, is the Beta-function so long as the real parts of both exponents are 
greater than minus one, so 


^ {A _ — u + 1) + t 9 v!/ 2] 

= r M n - U ± 1) + Mr[ j (n - u + 1 + «')] 
rti(n - « + l +1*0 + *M1«» - U + 1)1 ■ 

But here v! is always an even integer, hence by the well known recursion formula 
of the Gamma-function, which is valid for complex arguments excluding only 
negative integers 

*»(0 • - « + 1 ) + <M(n - u +' 3 ) + t] 

• • • [i(n - u + u' - 1) + f]} -1 

where 


c» — [§(w — « + l)](i(w — u + 3)] • • • [$(n — u + u' — 1)]. 

1 See Whittaker and Watson, A Course in Modem Analysis, Fourth edition 1927, Chap. 12. 
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Now set 


V «= log X * y Pl+l + y, 1+t + -h y P 

and the characteristic function of y is 


*«) - II «.{(§(» - « + 1) + <][*(» - u + 3) + t] 

U—JIJ+l 


, ••• [J(» — w + v! — 1) + f]} l . 

From the characteristic function, we can obtain the distribution function, 
g(y), of y by the relation 

() Cn_ r im _ e~ yt di _ 

9W “ 2« L<« nU,+i [|(n - « + 1) + t) • • • [J(n -u + u'-l) + t) 


where 


= * 


*-pi+i 


The integration can be carried out by the method of residues; since y is always 
negative (the range of X is from 0 to 1), on a half circle with center at the origin 
in the negative half of the complex J-plane, the integral of the function $(£) 
converges to zero as the radius of the circle becomes infinite. Since &(t) is 
analytic except for a finite number of poles on the negative real axis, g(y) is c n 
times the sum of the residues at these points. 

e~ vi 

Now <P(t) is of the form —■ where P(t) is a polynomial in t as follows: 

*w 

suppose that the groups contain r x , r 2 , • • • , r k variables respectively, then let 
(kj + 1) be the number of these r’s which are gueater than or equal to j; theti 

P(0 = (i(n - 2) + t]*‘[Kn - 3) + <J*'[*(n - 4) + - 5) + 

[*(» - 6) + t] k ' +i,+kl ••• [*(» - p + 1) + t] k ’-' +k ’- 4+-+*i»rt-c|w-«i. 


where 

I _ c/2 if c is even 
1<T/ 1 ~ (a - l)/2 if <r is odd. 

Then 

g(v; n, r t , • •., r*) = c n £ ri -35; + i(» - « - i )) ,a+1 $(<)]«—Kn-a-» 

a —1 Va' OF* 


where 


+ 1 *= fca + ia-J + • • * + • 
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It can be shown that 6 a is 0 for a between 1 and p — 2. Thus we have 
g(y, n , r t , • • • , r*) and from it we can calculate f(X; n, r», • • • , r k ). 

Suppose p = 8 and that the variables are divided into two groups of four each, 
then we will calculate the distribution function/(X; 4, 4). Now 

( , a) _ Cn f*'" e-*‘df 

2« 1.. [*(» - 2) + *M(n - 3) + *][*(« - 4) + <] 2 

•ft(n - 5) + IpMn - 6) + <][*(» - 7) + fl 

and 


Then 

4, 4) = 16c* £ 
Since 


_ 1(n _ 3)t + 8e i<n “ 4) ‘' 8e |(n - 6) ‘' 


90 


+ «* 


_ _ ye"'-*' ye^ n ~ Kv 

6 + "90" 3 " + 3 


d\ 


]• 


we have 


/(x; 4,4.) - 


16c, 


[- 


y = log X, dy = 


—4) ^i(n—6) 8X* (n “ 8) 8X* (n ~" 7) 


30 


+ * ( 

— — I 


8) ^*(n-9) 


2 ' 30 

The cumulative distribution function is given by 


(X s 


*(»-7) 


->)logx]. 


/»(4, 4) = Prob [X < to; 4, 4] 

_ 16c* j(„- 7 ) r 1 to 1 4(4n — 23)io , 14(4n — 13)to* 

3 W Ll6(n - 7) 6 3(n - 5) 2 + 3(n - 4) s 

, to* to* / 2to , 2to* \, "I 

. + n - 3 15(n - 2) \n - 6 + n - 4 j l0g “J* 

Wilks’ expression for the cumulative distribution function appears to be quite 
different, but if we substitute n = N — 1 and use the relation 

- 6; 4) = jp—jf " ** 7 (! - *)'<** 

= i(n — 2)(n — 3 )(n — 4)(n — 5) 

|V ln “ 8) 3to 1(B_4) 3ic‘ (n ~ s) to‘ (n - J) "l 

|_n — 5 n — 4 n — 3 n — 2j 
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it can be shown that the two formulas for the cumulative distribution are 
identical. 

In cases where u' is not always an even integer, the exact distribution func¬ 
tion of X can still be obtained using this method. However, in such a case, the 
gamma functions do not cancel out and the integrand has an infinitude of 
poles, so the function is expressed by an infinite series. We will use a different 
method to obtain an infinite series expansion. 


3. A series expansion of the cumulative distribution function. Let us put 

v = —y, and let the density function of v be h(v) f then from (2), we have 


h(p) dv 


J Cn f 100 U TT T[ i(n — u +1) + t] dt 
27ri J-too ti—ri+i r[J(n — u + 1 + vf) + t] 


Since v is a monotonic decreasing function of X, and since the critical region for 
testing the null hypothesis is given by the inequality X < X 0 , then the critical 
region will be defined by v > vq , where Vo is such that 



is equal to a chosen level of significance. 
Proposition 1. 


h(v) = h n (v)\f/(v) 

where $(v) does not depend on n, and h n (v) = c n e~* v , 
Proof: Let 


t' — t + £(n — p). 


Then 

h ( \ __ Cn r teo+ l {n p) v( |/-| (n _ p)) irr T[$(p — U + 1) + t']dt f 

2 wi J-ioo+Unr-p) U r[i(p ~ u + v! + 1) + t f ] 

Now the area in the complex plane bounded by the vertical line through %(n — p), 
by the vertical line through the origin, and by arcs of a circle with center at the 
origin of arbitrary radius is one in wliich the integrand is everywhere regular. 
Furthermore, the integral along the arcs approaches zero as the radius of the 
circle approaches infinity, hence the integrals along the vertical line through 
\(n — p) and along the vertical axis are equal. Then we may write 

e? * i ["* r *«'+p/ 2 ) tt r[£(p — u + 1) + t']dt' 

c n K ) 2 in L im V r[i(p - tt + tt' + 1) + n 

= $(v). 

Therefore 

h(v) = 
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PROPOSmON 2. 


where we define 


so that 


f-*i.r ^ *.i 

»-*«o r(r) 


r- s 


r * i[r a ri + r a (ri + r s ) + ... + r*(n + r» + • • • + r*_i)] 
gjT, u'. 

U 

Proof: Let 


then 


Hence 


but 



v 


* 



_ tt r i(n - u + 1 + U f ) 

n K L T *(n - tt + 1) 


and therefore 


/« = lim ( 2 -V' /2 - i 

»-*» T§(n — u 4- 1) \»/ 

by an application of the Stirling approximation. Therefore 

/ - n iu -i. 


We then write 
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hence 

(3) h(v) - C " e ~r(^ (t,) < 

Proposition 3. For any positive integer e, 


lim |n'.Prob (v > “ 0. 


Proof: Since v = —log X, the inequality v > 1 /y/n is equivalent to the in¬ 
equality X < Since X = z u , the inequality X < er+iV* implies that 


*—* 1+1 

there exists at least one value of u for which 


Zu ^ e 


-lHp-pi)y/n 


Hence 


£ P(zu < e~ inp - pi)V ”) > P(\ < e~ llV ~ H ) = P(v > 1/y/n). 

*-*1+1 

Hence in order to prove Proposition 3 we have only to show that for each u and 
any arbitrary positive integer 8 


lim {n # .P(z w < er l i<p-*oV*)} = 0. 


From (1) we have 
P(z u < 0-1/<*-pi>V'») 


= M -V fr—> / 9 i [ zi (n — 1> ( 1 - dzu. 

B[J(n - u + 1); u '/2] \ 

Over the range of integration, we have z. < erU(p-roV" go 

, ,_ N e \(.n-u-l)np-vi)Vn <•.-!/, 

P(zu < er^p-roVn) < «- — / (1 - zu)* % “® diu 

B[i(n - u + 1); tt /2] Jo 


«-»(•—»,<«•) VS r 2 /f 

B[K» - « + 1); tt'/21 L u' U zj Jo 

2e~* (»*-«~i) / (p-p i) \/» _ „ 

t? • B[i(w - « + l);« 72] [1 ” (1 - 6-1 B) J ' 


i«—i/(j»—j»») v* 


It follows from the Stirling formula that 


lim (sY^BftG* - « + 1); u'/2] = lim ~ ^ ^ ^ 

»-»« \2/ rj(n — « + «+!) 


r(u'/2). 
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Since 


lim V*/2 (i>-*i> a® 0 


and 

lim (1 — (1 — e-i/V*)) = 1, 


the proposition follows. 

Proposition 4. The function \p(v) of formula (3) can be expanded in a power 
series, i.e. 

\//(v) = «o + &iV + atV 2 + • • • 


with a finite radius of convergence . 

Proof: Wilks 4 has considered the following integral equation: 


w*g(w)dw = CB l 


r(6i + t) • r(6 2 + t) • • 
r(d + *).r(*"+ t) 


r(bq + t) 

r \c q + O’ 


where C = ^ and 0(w) are independent of £, and 5* < c* 

(i = 1,2, • • • , g). Wilks has shown that the solution of the integral equation, 
g(w), is given by the following expression: 


kw bg 

g( w ) =- 


- 0 - 5 ) 


7q-Pq-l 


(4) 


where 


and 




lbs—«» 


n 1 ... f vl '- b '- 1 v c t *- b *- 1 . . . vjl -! 1- **- 1-1 

Jo 

X (1 - ... (1 - 

X J ^ 1 - »i(i - - {t)i + v 2 (l - Vi)} ^1 - Jj^j 

X £l — {t>j + t> 2 (l — vj) + • • • 

+ - fi)(l — »*) — (1 — y«_2)! ^ 1 - 


q-l-C 


X dv i dv 2 • • • dv q -i 


*-n 


Tied 


ti r(6*)r( Cj - b t ) 


*-l i-l 

T» ®* jL# ft ” 1C 


1,-0 


>-0 


4 S. S. Wilks, “Certain generalizations in the analysis of variance,” Biometrika , Vol. 24 
(1932), pp. 474-5. 
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the range of w being 0 £ w & B. Wilks has furthermore shown that 

for w > 0 and 0 g v { g 1 (i = 1, 2, • • • , q — 1). 

We denote the left hand side of (5) by £<. The factor (1 — f < ) 6<-0<+1 can be 
expanded in a power series, i.e. 

(6) (i - = a - r.-r ( “ +, ~ 6<> 

— 1 + (c»+i “ 5<)f< + K c <+i — bi)(ci+i — bi + l)£i + • • • 
with a radius of convergence equal to one. Since we will show shortly that for 
the choices we make for the b/s and c/s, c*+i > b», then all coefficients in this 
last expansion are non-negative. Substituting this series expansion (6) in (4), 
and ordering it according to powers of (1 — w/B ), the expression under the inte¬ 
gral sign (in 4) becomes 

+ #i(vi , • • • ^1 — + 6i(v i, • • •, tv-i) ^1 — + • • •. 

This series is uniformly convergent over the domain defined by the inequalities 

and | 1 — w/B | < 1. We can even say that 

(7) is uniformly convergent for | 1 — w/B | < 1 if we substitute for each Bi 
the maximum of Bi with respect to Vi , tfe , • • • , v Q -i . Hence we may integrate 
the series (7) with respect to V\ f v %, • • • Vq-i term by term, i.e. 

(8) / (7) dv2 • • • dvq ~i = <ro + <ti ^1 — + cr* ^1 — + • • • 

and the series (8) is uniformly convergent for | 1 — w/B | < 1. The coefficients 
cr 0 , <ti , • • • are non-negative. 

The case of the X statistic which we are considering is a special case of this 
integral equation which we obtain by making the following substitutions: 

w = X, 5=1, tt*r + pi, q = P ~ P i 

b r = - w + 1), c r = i(n - u + u' + 1), (r = 1, 2, .•. , p — fr) 

Note that then 

Cr+l - b r = }[(« + 1)' - 1] 0. 

Hence, according to (4) 

g{\) dk = k -\" n ~ p - 1) (l _ + ^(1 - X) + <r,(l - X s ) + ...} dk 

where the infinite series converges for | 1 — X ( < 1. 

Now v = —log X, or X = e~ v , hence 

h(v) dv - k.e~ itn - p+1) 'v T - 1 (/-=fy («o + « 1 » + «*«*+•••} dv 


{»! + t*(l — Vi) + • • • + V<(1 — «i)(l 


(1 — } i 
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where the series {«o + «it> + t»v* + •••} is obtained from the series {«ro + 
o-j(l — X) + • • •} by substituting for (1 — X) the Taylor expansion of (1 — e“*). 
The series {«o 4- tiv + hv* + • • •} has a finite radius of convergence. 6 
Hence the function ^(v) can be written as 


+(v) = {eo + «v + *•>+•••} 

—-—) can be 

expanded in a Taylor series around v = 0, Proposition 4 is proved. 


4. Evaluation of the coefficients in the expansion of \p(v). Let the series 
expansion of be 


yp{v) = ao + <x\V + a*y 2 + 


Then we have 


t 


*° c H e~ in, v r ~ 1 


r(r) 


(ao -j- aiV -|- «jt>* •••)<&* 1. 


Now let v* — - v, then 
2 


rftY fr-cyr v, 

X \n/ r(r) v 


, 2aiV* , 4at2V* 

ao H-+ 

n 


m+...) 

n 2 ' / 


dv* as 1. 


Suppose that the asymptotic expansion 


, /nV 1 

ion of { - J - 

\2 / Cn 


is given by 


<>. + £ + £+ 


On account of Proposition 3, we have that the asymptotic expansion in powers 
of 1/n of 


(9) 


/V* (T**v* r 1 / . 2ai * . 4a* ** , \ j * 

1 ~T(rr\ at+ T V + + " 7 * 


must be equal to the asymptotic expansion of ~ . Since we may integrate 
in (9) term by term for sufficiently large n, we easily obtain 


ao = ft, ai 


ft 
2r’ 


a* — 


ft 


2**r(r + 1) •.. (r + k — 1)‘ 


1 See A. Gutzmer, Theorie der EindetUigen Analytischen Funktionen, 1906, pp. 91-2. 
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The asymptotic expansion of 


nY 1 

.2/ Cn 


can be calculated in the following manner: 


and 


a _l 01 i 0* i 

/n + 2V * »+ 2 (»+ 2)* ^ 

' * ,e ~' ' «• + ; + § + 

n n l 



(1 + 2/n) T II 

U 


n — u + 1 
n — u + u f + 1 * 


Equating the right hand members of these last two equations, and taking: 
logs, we obtain 

log |/ 0 + nT2 + («T2)* +-‘-]=»-lo«a+2/n) + i:iog(l- ^ ) 

- £ log (l - - ~) + l0 8 (#> + | + § + ••*)• 


Then we expand each term in a series of powers of 1/n and equate coefficients 
of 1 /n‘ for each i. We obtain the following formulae for the first five 0’b: 

0o = 1 

|3i = r + { 2 (« — D* — i 23 (« ~ **' _ D* 

u u 

d* = 0t + ^ + jTj £ (« — i)* — ^ £ (« - «' — i)* 

0* — —40i — 0i — l0i + 010 * + 20* + |r 

+ A £ (« - l) 4 ~ * £ (« - n' - l) 4 

* u 

ft — 2ft + 2^i + x 3ftft + ftft — Pifh — 4ft 

4 

+ f + 3 0*-| r + ^? (tt - 1)1 (»-«'- i)# . 


5. Practical use of the series. In practical applications, the value of the 
statistic, say Xo, is calculated, and it is desired that we determine whether or 
not this value of the statistic falls into the critical region. That is, for a partic¬ 
ular grouping of the variates, for a particular number of degrees of freedom, and 
for a chosen level of significance a, there is determined from the distribution of 
X, a value X* such that 


= a, 


Prob [X < X*] 
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and if Xo < X* wo reject the hypothesis that in the population from which the 
sample is taken all the correlation coefficients between variates in different 
groups are zero. 

Since v is a monotonic decreasing function of X we make the test by computing 
v 0 = — log Xo and we reject the hypothesis if Vo > v* where v * — —log X*. But 
this is equivalent to computing Prob [e > %] and if this value is less than a we 
reject the hypothesis. Now 

Prob [a > a 0 ] = </„ 0 Oi, r 2 , • • •, r*) 

= f e inv v r X (1 + ot\V + «*a 2 + • • •) da. 

T(r) J H 


Setting g — 2 

p»b [• > »] = (|) ^ f nit rr' [i + «, | + «,(?)*■ + ...] i,. 


On account of Proposition 3 we obtain an asymptotic expansion of Prob [a > ao] 
by integrating the right hand member of the above equation term by term. 
This can be expressed by means of the incomplete gamma function, which is 
tabulated* in the form 


We obtain 


Hu, p) 


Cuy/p+l 

/ v p e~ v dv 
Jo _ 

r(p + 1) 


Prob |v > c] = (?) c„{[l - r - l)_ 


+ 


a[\ _ t / r x\ , _,/ r , ,y 

n L 1 1 Wr + i' / J + n 2 L 1 1 \2vV + 2 ’ + 7. 


+ 


The values of the constant K 


- 


and the values of ft, ft, ft, ft are 


herein tabulated for any grouping which might be made on six or fewer variates. 
Some cases, such as groupings (1, p — 1), in which case the distribution of X 
is the distribution of the multiple correlation coefficient; and as the groupings 
(2, p — 2), the exact distribution for which was given by Wilks as an incomplete 
Beta-function, are superfluous here. These eases are included only for the sake 
of completeness. 


* K. Pearson (Editor), Tables of the Incomplete Gamma Function, Biometric Laboratory, 
London, 1022. 
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Table of the First Four 0’s 


Grouping 

r 

ft 

02 

A 

ft 

2,1 

i 

2 

4 

8 

16 

1,1,1 

1.5 

2.75 

6.28125 

13.38281 

27.57568 

3,1 

1.5 

3.75 

12.03125 

36.91406 

111.55225 

2,2 

2 

5 

19 

65 

211 

2,1,1 

2.5 

5.75 

23.53125 

83.97656 

279.50538 

1,1,1,1 

3 

6.5 

28.625 

106.9375 

366.39844 

4,1 

2 

6 

28 

120 

496 

3,2 

3 

<) 

55 

285 

1351 

3,1,1 

3.5 

9.75 

62.53125 

334.10156 

1615.91163 

2,2,1 

4 

11 

77 

439 

2229 

2,1,1,1 

4.5 

11.75 

86.03125 

| 506.16406 

2628.23974 

1,1,1,1,1 

5 

12.5 

95.625 

580.6875 

3085.52344 

5,1 

2.5 

8.75 

55.78125 

1 315.82031 

1690.65282 

4,2 

4 

H 

125 

910 

5901 

3,3 

4.5 

15.75 

154.03125 

1205.03906 

8277.55226 

4,1,1 

4.5 

14.75 

136.28125 

1015.50781 

6693.45068 

3,2,1 

5.5 

17.75 

189.53125 

1584.10156 

11445.75538 

2,2,2 

6 

19 

214 

1866 

13947 

3,1,1,1 

6 

18.5 

203.625 

1740:9375 

12797.27344 

2,2,1,1 

6.5 

19.75 

229.03125 * 

2042.16406 

15530.08351 

2,1,1,1,1 

7 

20.5 

244.625 

2230.1875 

17257.64836 

1,1,1,1,1,1 

7.5 

21.25 

260.78125 

2430.49219 

19139.02892 
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Tables of the Constant K — C, 


n 

21 

111 

10 

.800 

.738 

11 

.818 

.761 

12 

.833 

.780 

13 

.846 

.796 

14 

.857 

.810 

15 

.867 

.822 

16 

.875 

.833 

17 

.882 

.843 

18 

.889 

.851 

19 

.895 

.859 

20 

.900 

.866 

22 

.909 

.878 

24 

.917 

.888 

26 

.923 

.896 

28 

.929 

.903 

30 

.933 

.910 

35 

.943 

.922 

40 

.950 

.932 

45 

.956 

.940 

50 

.960 

.946 

55 

.964 

.950 

60 

.967 

.954 

65 

.969 

.958 

70 

.971 

.961 

80 

.975 

.966 

90 

.978 

.970 

100 

.980 

.973 


31 

22 

211 

.646 

.560 

.517 

.676 

.595 

.553 

.702 

.625 

.585 

.724 

.651 

.612 

.743 

.674 

.637 

.759 

.693 

.658 

.774 

.711 

.677 

.787 

.727 

.694 

.798 

.741 

.709 

.808 

.754 

.723 

.818 

.765 

.736 

.834 

.785 

.758 

.847 

.802 

.777 

.859 

.817 

.793 

.869 

.829 

.807 

.877 

.840 

.819 

.894 

.862 

.843 

.908 

.879 

.862 

.918 

.892 

.877 

.926 

.902 

.889 

.932 

.911 

.899 

.938 

.918 

.907 

.943 

.924 

.914 

.947 

.930 

.920 

.953 

.938 

.930 

.959 

.945 

.937 

.963 

.951 

.943 


1111 

41 

311 

.477 

.480 

.310 

.515 

.521 

.352 

.548 

.556 

.390 

.576 

.586 

.424 

.602 

.612 

.455 

.624 

.636 

.482 

.645 

.656 

.508 

.663 

.675 

.531 

.679 

.691 

.552 

.694 

.706 

.571 

.708 

.720 

.589 

.732 

.744 

.620 

.752 

.764 

.647 

.770 

.781 

.671 

.785 

.796 

.691 

.798 

.809 

.710 

.825 

.835 

.747 

.846 

.855 

.776 

.862 

.871 

.799 

.875 

.883 

.818 

.886 

.894 

.833 

.895 

.902 

.846 

.903 

.910 

.858 

.910 

.916 

.867 

.921 

.926 

.883 

.929 

.934 

.896 

.936 

.941 

.906 
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Tables of the Constant K (it) 


n 

221 

2111 

32 

11111 

51 

42 

33 

10 

.269 

.248 

.336 

.229 

.323 

.168 

.136 

11 

.310 

.288 

.379 

.268 

.369 

.206 

.171 

12 

.347 

.325 

.417 

.304 

.410 

.243 

.205 

13 

.381 

.359 

.451 

.338 

.445 

.277 

.237 

14 

.412 

.390 

.481 

.368 

.478 

.309 

.268 

15 

.441 

.418 

.508 

.397 

.506 

.339 

.297 

16 

.467 

.444 

.533 

.423 

.532 

.367 

.324 

17 

.490 

.468 

.556 

.447 

.555 

.392 

.350 

18 

.512 

.490 

.576 

.470 

.576 

.416 

.374 

19 

.532 

.511 

.595 

.490 

.596 

.438 

.396 

20 

.551 

.530 

.612 

.510 

.613 

.459 

.417 

22 

.584 

.564 

.642 

.544 

.644 

.496 

.455 

24 

.613 

.593 

.668 

.575 

.671 

.529 

.489 

26 

.638 

.619 

.691 

.601 

.694 

.558 

.519 

28 

.660 

.642 

.711 

.625 

.714 

.584 

.546 

30 

.680 

.662 

.728 

.646 

.731 

.607 

.570 

35 

.720 

.704 

.764 

.689 

.767 

.654 

.621 

40 

.751 

.737 

.791 

.723 

.794 

.692 

.661 

45 

.776 

.763 

.813 

.751 

.816 

.722 

.694 

50 

.797 

.785 

.830 

.773 

.833 

.747 

.721 

55 

.814 

.803 

.845 

.792 

.848 

.768 

.743 

60 

.828 

.818 

.857 

.808 

.860 

.786 

.762 

65 

.841 

.831 

.868 

.822 

.870 

.801 

.779 

70 

.852 

.842 

.877 

.833 

.879 

.814 

.793 

80 

.869 

.861 

.892 

.853 

.894 

.836 

.817 

90 

.883 

.876 

.903 

.869 

.905 

.853 

.836 

100 

.894 

.888 

.913 

.881 

.915 

.867 

.852 



152 


A. WALD AND R. J. BROOKNER 


Tables of the Constant K (tit) 


» 

411 

321 

222 

3111 

2211 

21111 

111111 

10 

.155 

.108 

.094 

.100 

.087 

.080 

.076 

11 

.192 

.140 

.123 

.130 

.114 

.106 

.099 

12 

.228 

.171 

.152 

.160 

.142 

.133 

.125 

13 

.261 

.201 

.180 

.189 

.170 

.160 

.150 

14 

.292 

.230 

.208 

.217 

.197 

.186 

.176 

15 

.322 

.257 

.235 

.244 

.223 

.212 

.201 

16 

.349 

.284 

.261 

.270 

.248 

.236 

.225 

17 

.375 

.309 

.285 

.295 

.272 

.260 

.248 

18 

.398 

.332 

.308 

.318 

.295 

.283 

.271 

19 

.421 

.354 

.330 

.340 

.317 

.304 

.292 

20 

.442 

.375 

.351 

.361 

.338 

.325 

.313 

22 

.479 

.414 

.390 

.400 

.376 

.363 

.351 

24 

.512 

.448 

.424 

.434 

.411 

.398 

.385 

26 

.542 

.479 

.456 

.465 

.442 

.430 

.417 

28 

.568 

.507 

.484 

.493 

.471 

.458 

.446 

30 

.591 

.532 

.510 

.519 

.497 

.484 

.472 

35 

.640 

.585 

.564 

.573 

.552 

.540 

.528 

40 

.679 

.628 

.608 

.616 

.597 

.585 

.574 

45 

.710 

.663 

.644 

.652 

.633 

.623 

.612 

50 

.736 

.692 

.674 

.681 

.664 

.654 

.644 

55 

.758 

.716 

.700 

.706 

.690 

.681 

.671 

60 

.776 

.737 

.722 

.728 

.712 

.704 

.695 

65 

.792 

.755 

.740 

.746 

.732 

.723 

.715 

70 

.805 

.771 

.757 

.762 

.749 

.741 

.733 

80 

.828 

.797 

.784 

.789 

.777 

.770 

.762 

90 

.846 

.818 

.806 

.811 

.800 

.793 

.786 

100 

.860 

.835 

.824 

.828 

.818 

.812 

.806 



THE MEAN SQUARE SUCCESSIVE DIFFERENCE 

/ 

By J. von Neumann, 1 R. H. Kent, H. R. Bellinson and B. I. Hart 
Aberdeen Proving Ground 


1. Introduction. In making measurements, every precaution is generally 
taken to hold the conditions of the experiment constant, in order that the 
population, whose parameters are to be estimated from the observations, shall 
remain fixed throughout the experiment. One wishes each observation to come 
from the same population, or what is the same thing if normality is assumed, 
from populations having the same means and standard deviations. 

There are cases, however, where the standard deviation may be held constant, 
but the mean varies from one observation to the next. If no correction is made 
for such variation of the mean, and the standard deviation is computed from 
the data in the conventional way, then the estimated standard deviation will 
tend to be larger than the true population value. When the variation in the 
mean is gradual, so that a trend (which need not be linear) is shifting the mean 
of the population, a rather simple method of minimizing the effect of the trend 
on dispersion is to estimate standard deviation from differences. It is for this 
purpose that the mean square successive difference 


h 2 - ’- 1 


23 (3*1 - Xif 


71—1 


is suggested. The subscript i in this expression refers to the temporal order of 
the observation a\-. 

In using 8 2 for estimating standard deviation, the distribution of 8 2 in random 
samples is of interest, since questions of bias, efficiency, and confidence interval 
require consideration. 5 2 may be used, in addition, to determine whether a 
trend actually exists; in this case one must know whether h 2 differs significantly 
from 


8 2 = — 


1L {Xi - if 


which measures variance independently of the order of the observations, and 
consequently includes the effect of the trend. 


1 Institute for Advanced Study, Princeton, N. J. Also member of Scientific Advisory 
Committee of the Ballistic Research Laboratory, Aberdeen Proving Ground. 
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The distribution of 8 2 is considered in this paper; it is hoped that others will 
shortly publish methods of estimating the probability that 8* £ ks 2 as a function 
of k and the sample size n. 


2. History. A somewhat similar procedure is suggested by “Student” [1] 
and E. S. Pearson [2] who consider the situation in which a shift may occur in 
the mean of the population, but where pairs of observations may be made with 
no shift in mean between them; standard deviation may be estimated from the 
differences between these pairs. The method can be generalized, and 


/ n/2 

s 


{%2i — Zjt-l) 2 


n 


is an estimate of the standard deviation, n must, of course, be an even integer. 
This estimate has the advantage that its properties are fully known: s' is dis¬ 
tributed as the standard deviation with / = n/2 degrees of freedom. It will be 
noted that this estimate does not involve the successive differences, but only 
the alternate ones. Although there are n — 1 available successive differences, 
this estimate uses only the n/2 independent differences. The mean square 
successive difference is based on all n — 1 successive differences, and should 
therefore provide a more efficient estimate of a than does s'. 

There is, of course, nothing new in the concept of estimating the standard 
deviation from differences. Even as far back as 1870, an interest in the method 
appears to have existed. Jordan [3] devised methods based on sums of powers 
of the differences. Helmert [4] gave more careful consideration to the case of 
the first power, i.e. the sum of the absolute differences. In both these cases, 
however, all the n(n — l)/2 differences that can be established from a sample of 
n observations were included in the estimate, so that the estimate was of no 
value in reducing the effect of a trend. Helmert realized this, for he pointed 
out that the estimate obtained from the sum of squares of the differences is 
exactly that obtained by the more conventional procedure of squaring deviations 
from the mean. 

The usefulness of the differences between successive observations only appears 
to have been realized first by ballisticians, who faced the problem of minimizing 
effects due to wind variation, heat and wear in measuring the dispersion of the 
distance traveled by shell. Vallier [5] appears to have been the first to estimate 
dispersion from successive differences. Cranz and Becker [6] commended the 
mean successive difference 


n—1 

\Xi+i- Xi 

n — 1 



To establish the precision of Ed in estimating <r, Cranz and Becker quoted 
Helmert’s paper, and so erred in saying that their method was superior to that 
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of the mean deviation. Helmert’s procedure, based on n(n — l)/2 differences, 
is indeed more precise (for n > 10) than the mean deviation 

M.D. m £1 -, 

n 

but the mean successive difference is based on but n — 1 differences, and so is 
not as precise. 

Bennett [7] appears to have suggested the use of successive differences inde¬ 
pendently of the European ballisticians. In recent years, the method of esti¬ 
mation by the mean square successive difference 8 2 was put into practice in the 
Ballistic Research Laboratory at the Aberdeen Proving Ground, U. S. Army, 
by L. S. Dederick. 


3. Bias and efficiency. The moments of 5 2 in samples drawn from a normal 
population are derived in Section 6 of this paper. The moments are used at 
this point to establish the estimate of variance, and the efficiency of this estimate. 
The mean value of 8 2 in samples taken at random from a normal population is 


(3) E(f) - 2<r 2 . 

S 2 consequently offers an unbiased estimate of variance, and this estimate is 


(4) 


»2 23 (Si+l - Xi) 2 

0 __ i-l 

2 2(n - 1) 


The second moment, i.e., the variance, of 8 2 in samples of size n is 


(5) 


s __ 4(3n — 4) 4 


As the sample size is increased, the distribution of 8 2 appears to approach 
the normal. It is therefore appropriate to consider the efficiency as defined by 
Fisher [8]. Accordingly, the efficiency of 8 2 is 


Since 



2 

<r,i 


2(»-l) 4 
-=- <r . 


Etf) 


n — 1 


2 

<* y 


and 
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the efficiency of $ 2 in estimating the standard deviation is 


( 6 ) 


2 (n - 1) 
3n — 4 



The efficiency is unity for n = 2, since in this case the two statistics have 
the same distribution. It therefore appears that the efficiency decreases as the 
sample size increases, but approaches 2/3 as a limiting value for n very large. 


4. Summary of procedure. Having a statistic which estimates a parameter 
of a population, it is desirable to know the distribution of that statistic as com¬ 
puted from samples taken at random from that population. At present, the 
distribution of $ 2 in samples of n has not been obtained. The difficulty is in the 
fact that the successive differences are not independent. The first difference, 
di = Xt — Zi , and the second difference, d 2 = x$ — x 2 , are related in that they 
both involve x 2 . Similar correlation exists between every successive pair of 
differences between successive observations. 

For n = 2, and samples taken from a normal population, the distribution of 
6 2 is known. Since 

S 2 = (xj - x,) 2 = 2 i; (x< - x) 2 = 4s 2 , 

t-1 

the distribution of 8* is similar to that of s 2 for this sample size. 

For n = 3, the distribution of 6 2 has been derived analytically. The deriva¬ 
tion is indicated in Section 5 of this paper. For n > 3, only the moments of 
the distribution have thus far been obtained. A Pearson type distribution has 
been fitted to the first three moments to obtain an approximate representation 
of the true distribution. 


5. Distribution of 5 2 . In the case of a sample of n taken from a normal popula¬ 
tion, the probability that the first observation lies between x x and Xi + dx x , 
while the second lies between x 2 and x 2 + dx 2 , etc., is 

(7) [<n7ij <r< * ?+,?+ '' dxidx * • • • d *n • 

If Vi = £*+i — Xi , this expression becomes 

(8) M e- Q(zl ' Vl ' v '"-' v *-' )l2 *'dx l d Vl dy* • • • d yn -i, 
where Q is a quadratic form in x x and the y J s. Since 
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the probability that i* shall be less than some value 6* is 

(9) 1 >(1‘ < Ji) - //•••/ £V—.'-‘’"-fe 

After the integration with respect to Zi is carried out, the quadratic form in 
the exponent may be normalized by a transformation to new coordinates z< 
linearly related to the y’s. The z’s may be so chosen that all the terms z* in 
the exponent have the same coefficient, in which case 


( 10 ) 




dzidbi <kn-i. 

y Z%~\) 


As a result of such a transformation, the sphere of integration in (9) becomes an 
ellipsoid in (10). By changing to polar coordinates, with 


( 11 ) 



t-i 


P{b l < &l) = c, ff e~ kT * r n ~ 2 dSl dr, 


l n which 0 is the solid angle in the space of n — 1 dimensions. The limits of 
integration with respect to il as a function of r must be found; this involves the 
evaluation of the solid angle subtended by the surface bounded by the inter¬ 
section of the (n — 1 )-dimensional sphere and the (n — l)-dimensional ellipsoid. 
If 12 = 

(12) Ptf < Si) = c 2 £ <r* P V(r)r n ~ 2 dr, 


in which a is the longest semi-axis of the (n — 1 )-dimensional ellipsoid cor¬ 
responding to the given value of 6 2 . 

For n = 3, (9) becomes 

PCS’ < j!) = J // £ exp [ - i (»: ++ n n> 

»!+»»<*«» 

(13) - 9? (* + — 3~ W ) 

»!+»*<«; 

Normalizing the quadratic form in the exponent, 

(14) ptf < * - If 
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—r l too»* 0+i iin* 0J/2** 


do dr 


and in polar coordinates 

--•^vbrr 

- 2 vb r [r ■“] *• 


(15) 


The integral in brackets can be shown to be a Bessel function of zero order; 
for let 

r 2 /3 o' — —2iu, 

7T 


* = i - »■ 


then 

(16) 


e r ' ,in ' eiic, d8 = e -< “ £ e iu,in *d<t> = 2re“ <u /o(u). 
Consequently, (15) takes the form 

(17) *><»’ < ti) - ^ C' r ' n ~’"" J ‘(&) dr “ '<*>■ 

The probability density function 

dF(6 2 ) 


J»(«*) 




(18) 


= <rV3 e 




/*a^\ 

W/ 


vr 




1 5 4 


1 5 8 


[* + 2* 3*<r< + 2*4* 3*<r» 


, _j__ jL. 1 

_r 2*4*6* 3»<r** * J* 


6. Moments. The <-th moment of 5* about the origin is defined by 

ad) b - Eim, 


or 

( 20 ) 


(n - l)‘b 


^([S <a ^‘ ~ !C ' ) ']) 


*§*«*])• 


For any value of f, the expansion can be performed, and similar terms col¬ 
lected and enumerated. The values of x can be considered as true errors, i.e. 
as deviations from the true mean, without affecting the conclusions. If the 
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original population from which the samples have been drawn is normal, with 
standard deviation a, then: 


( 21 ) 


= 


0 

(2fc)l 

2*jfc! 



and since, in the null case where the mean of the population remains constant, 
successive observations are independent, then 

E{xU') « E(*T 9 ), i=j 

E(x r ix') = E(x r )E{x*) } i 9* j. 


These relations are sufficient for the evaluation of p t . For example, in the 
case of the second moment, t = 2: 

(23) (n — 1)>2 = E ^2 ]C x* — (x* + x\) — 2 X) . 


Now: 



x? - (a* + x 2 «) 



= 4 



+ (x? + x\f + 4 




= 4 TS *5l + [xi + 2x*x» + x\] 

L<-1 J 

+ 4 x?+ix?j - 4 j^*{ + x? £ x\ + xi £ x* + xi J 
+ [terms containing odd powers of x<]. 


The mean of these terms is found by using (21) and (22), and the number of 
each type of term present is enumerated: 

4[w(3<r 4 ) + n(n — l)<rV 2 ] + [3<r 4 + 2 a 2 a 2 + 3<r 4 ] + 4[(n — 1 )<rV] 

- 4[3<r 4 + c\n - l)c + c(n - l)o- 2 + 3<r 4 ] = (4»* + 4» - 12)<r 4 . 


Consequently 

(24) 


i 4 (» ! + n — 3) 4 

(n — 1)* ' 


The first four moments about the origin were evaluated by this procedure, 
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and from these, the moments about the mean are readily determined. The 
results are: 

M = 2<r 2 

> 4 (n 2 + n — 3) 4 

w "-TW -if~ m 

, 8 (n 8 + 6n 2 + 2n - 21) , 

_ |)5 

r 16(n 4 + 14n 3 + 53n 2 - 8n - 231) g 
(25) * " (n — l) 4 .. ' 

Mi = o 

4(3 n — 4) 4 
w = a 

32(5n - 8) , 

" = - - 

_ 48(9w 2 + 46n - 112) « 

"(» - l ) 4 

It should be noted at this point that the above fourth moment is incorrect 
for n = 2. One of the terms in the expansion of the right side of (20), for 
t = 4, is 

n —1 

2 2 V' 2 2 

Z^Xi+lXi. 

»-1 

For n = 2, the mean value of this term is 

Eixlxlx \x\) = E(x{)E(x i) = 9<r 8 , 
whereas for w > 2, the mean value is 

+ ^(xiXn-ixi) = (n + 3 )<r*. 


7. Pearson type fit to distribution of 6 2 . From the moments it is found that 

« _Mj 16(5n - 8) 2 

(3n - 4) 3 ’ 

*6) 

M 4 _ 3(9n 2 + 46n - 112) 
mI (3n — 4) 2 -• 


As w becomes large, ft and ft approach 0 and 3 respectively; the distribution 
therefore appears to approach the normal for large samples. For finite sample 
sizes, the values of ft and ft correspond to those of the Pearson Type VI 
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distribution, 

”(?)“ «(? + “■)"(? + *)" 

The origin of this distribution is at = — aitr 2 , but the origin of the true dis¬ 
tribution must be at $ 2 — 0. By taking a\ * 0 so that the origin is at ti? — 0, 
we obtain what appears to be a suitable approximation 


(27) 




The parameters are determined by equating the 1st, 2nd and 3rd moments of 
(27) to the corresponding moments of the true distribution, with the result that 

= 3n 4 - 10n* - 18 n 2 + 79n - 60 
9t ~ 8n* — 50n + 48 


9i 

(28) 


4 — w(?» + l)(ft + 3) 
4-«(«,+ !) 

2(g t - qt - 2) 
qt + 1 ~ ’ 
a ; 1 -”- 1 

B(g* + 1, qi — qt — lj 


Values of these parameters for selected values of n are given in Table I. The 
sixth and seventh columns of this table give the values of ft for the distribution 
(27) and for the true distribution, respectively. 


TABLE I 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 






»t 

P* 

Ratio 

n 

• Qt 

Qt 

Os 

c 

(27) 

True 

(6)/(7) 

5 

24.4391 

0.6391 

26.6000 

5.8800 X 10* 4 

8.807 

8.504 

1.036 

7 

31.1286 

1.3857 

23.2571 

4.9285 X 10« 

6.948 

6.758 

1.028 

10 

41.2830 

2.5079 

20.9667 

9.4934 X 10 M 

5.658 

5.538 

1.022 

15 

58.2113 

4.3806 

19.2659 

4.0240 X 10 76 

4.718 

4.645 

1.016 

20 

75.1210 

6.3543 

18.4351 

1.8063 X 10“ 

4.269 

4.217 

1.012 

25 

92.0189 

8.1285 

17.9417 

8.1097 X 10 114 

4.006 

3.965 

1.010 

50 

176.4443 

17.5018 

16.9651 

1.3386 X 10 wo 

3.494 

3.475 

1.005 


The Tables of the Incomplete Beta-Function [9] can be used to evaluate the 
probability integral of the distribution (27), 


^-•r©re*-r-(£) 


(29) 


* 1 — i — q% — 1 , + 1 ) 


a* 

a* + $ 0 /'** 9 


X =5 
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for n g 14. For n > 14, the probability integral may be determined by quad¬ 
rature. Some values of the probability integral for n * 50 are given in Table II. 
A comparison with the integral of the normal curve having the same first two 
moments indicates that a sample of somewhat more than 50 is required before 
the normal curve becomes a satisfactory approximation to the distribution (27). 

TABLE II 


•M) *-» 


.*/ * 

V* 

(29) 

Normal 

.50 

• 00000 

•00118 

.75 

.00031 

•00563 

1.00 

.00647 

.02129 

1.25 

.04393 

.06418 
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THE RETURN PERIOD OF FLOOD FLOWS 
By E. J. Gtjmbkl 
New School for Social Research 

Introduction. Engineers have used various interpolation formulas to repre¬ 
sent the observed distribution of flood discharges. These formulas are some¬ 
times constructed ad hoc for a given stream, and have no general meaning. Most 
of them are rather complicated. 1 Some authors have tried to introduce upper 
and lower limits to the discharges, even though it is doubtful that such limits 
exist. Others have introduced the third and fourth moments of the distribution, 
in spite of the fact that these numerical values are subject to large errors. For 
some formulas it is impossible to give a meaning to the constants; different form¬ 
ulas applied to the same stream give rather contradictory results; and conse¬ 
quently there is considerable confusion. For example, Slade [20] has stated that 
“the statistical method in whatever form employed is an entirely inadequate 
tool in the determination of flood frequencies.” According to Saville [19] “the 
engineer should satisfy himself that he has used an adequate number of methods, 
whether mathematical, graphic or otherwise, which have real support from either 
theory or experience, and then form his own judgement.” 

The main reason for tliis situation is that these studies have little or no 
theoretical basis. The author believes it possible to give exact solutions, 
exactitude being interpreted from the standpoint of the calculus of probabilities 
[10]. Our solutions are simply the consequences of a truism: “The flood dis¬ 
charges are the largest values of the discharges.” The present study is but an 
explanation of this statement. 

Many American authors start with a statistical function, which we call the 
return period of floods. Therefore we shall first analyse the notion of return 
period and show how it can be derived as a consequence of the concept of dis¬ 
tribution. We then give a short r6sum6 of the theory of largest values. The 
discharge, and in consequence the flood discharge, is considered as an unlimited 
statistical variable; it is not necessary to determine its distribution. We are 
justified in representing the observed distribution of flows by one of the the¬ 
oretical distributions of largest values. The distribution we choose contains 
only two constants, and both have a clear hydrological meaning. The numeri¬ 
cal values are calculated by the method of moments. 

1 In recent years many articles discussing this topic have been published by the American 
Society of Civil Engineers and the American Geophysical Union [8]. A review of some of 
the proposed formulas is given in the Water Supply Paper 771 [17]. 
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The application of the notion of return period to the largest values leads to a 
simple formula for the return period of the floods. In the last part of this paper 
we represent the flood flows of the Rh6ne and Mississippi Rivers by our formula. 

1. The return period. Let us consider a continuous statistical variable x, 
having a theoretical distribution w(x). The probability W(x) of a value less 
than or equal to x, and the probability P(x) of a value greater than or equal to 
x , are 

(1) W(x) * w(z)dz, P(x) = J w(z)dz f 

where z denotes the variable of integration. Clearly 

(1') W(x) + P(x) « 1. 

Let n be the number of observations. Let x m (m = 1,2,---, n) be the 
observed values arranged in increasing magnitude, where m is the serial number 
beginning with the lowest (“from below”). The lowest observation has the 
serial number m = 1, the highest has the serial number m = n. These observed 
values will be written Xi , and x n respectively. The number of observations 
below or equal to x m is m = n'W ( x m ) where r W(x m ) is the observed relative 
number corresponding to the probability W(x). The graphic representation of 
this series is called a cumulative histogram. 

In hydraulics many authors arrange the observations in decreasing magnitude. 
Let m x (m = 1, 2, • • • , n) be these observed values. The serial number m is 
counted in a descending scale (“from above”). For the largest value m = 1, 
for the lowest value m = n. The number of observations above or equal to 
m x is m = n'P( m x) where f P( m x) corresponds to P(x ). The numbers 'W(x m ) 
will never decrease; the number 'P( m x) will never increase. The mth value on 
a descending scale is the n — m + 1 th value on an ascending scale. Therefore 

(2) n'P(mx) = n — n'W{x m ) + 1 , 

and 

(2') nP(x) = n — nW(x). 

The difference between formulas (2) and (2') will play a certain rdle later. 

Different methods are used in statistics in comparing the theoretical values 
W(x) or P(x) and w(x) with the corresponding observations 'W (x m ), or 'P( m x) 
(cumulative frequencies) and A'W(x m ) (frequency distribution). They all have 
in common an arrangement of observed values according to magnitude. 

For the purpose of considering the observations in chronological order, we 
introduce a statistical criterion which at first glance may appear to have a new 
logical structure. It is assumed here that the observations are made at constant 
time intervals, and this interval is considered the unit of time. We suppose 
that the observations are homogeneous, i.e., subject to a common set of forces. 
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Furthermore, we suppose that the events are independent of one another: the 
occurrence of a high or low value for z has no influence on the value of any 
succeeding observation. Let us choose a low value x, and ask the following: 
After what number of observations does this or a greater value return? We 
calculate the mean of these chronological intervals between every two consecu¬ 
tive values, equal to or greater than x. We repeat these operations for a second, 
third,. . . till the penultimate value of x. 

These means are called the observed return periods. The criterion consists of 
the comparison of the observed, and the theoretical return period for increasing 
values of x . For a discontinuous variable we could obtain the return period for 
a value equal to x y (not equal to or greater than x). This average time, which 
is sometimes used in physics, does not interest us, as our variable, the discharge, 
is continuous. We limit our consideration to the return period of a value equal 
to or greater than x, called: value greater than x. 

The determination of the theoretical return period is a classical problem: 
How many trials must, on the average, be made, in order that an event of a 
given probability should happen? Our event, the realization of a value, equal 
to or greater than x, has the probability P{x) = 1 — W(x). 

The mean number of trials T{x) which are necessary to obtain our event once, 
is evidently 


<3) r-W 

or 


(30 


T(x) = 


1 _ 

p\ x y 


This value T(x ) is the mean chronological interval between two values, equal 
to or greater than x. If we start at the time when such a value has been ob¬ 
served for the first time, we can interpret T{x) as the theoretical return period 
of a value equal to or greater than x. We designate it as the theoretical return 
period. This concept has not been used in statistics. It is a well-known con¬ 
cept in hydraulics which was introduced by Fuller [6). To every theoretical 
distribution w(x) there is a corresponding return period T(x) and conversely, 
to every theoretical return period T(x) there is a corresponding distribution 


(4) 


w(x) — 


T'(x) 

7W 


obtained by differentiating (3). 

If the variable is without limit to the left, the return period will start with 
T — 1. If the variable is limited to the left by x «the corresponding return 
period will be 


(5) 


n«) £ 1 


if W(t ) £ 0 
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In the graphic representation, the return period T(x) which has a time dimen¬ 
sion, will be the abcissa and x the ordinate. Therefore we consider x as a func¬ 
tion of T(x); from (4) we obtain 


dx 

dhTT 


w(x)T{x) 


where In signifies the natural logarithm. The increase of x as a function of 
In T(x) will be very rapid for small values of T. For a limited distribution 
the same result is obtained, provided the probability W(e) and the density of 
probability w(e) are sufficiently small. Clearly, the return periods of the three 
quartiles are respectively 1J, 2, 4. The return period will always increase 
with x . It will tend towards infinity even if the variable is limited to the right. 

Let us now consider the calculus of the observed return periods. Instead of 
values equal to or greater than x m we will only speak of values greater than x m . 
The observed return period is the interval between the first and the last observa¬ 
tion greater than x m , divided by the number of intervals between all observa¬ 
tions greater than x m . The number of observations greater than x m is n — 
n'W(x m ). Between these observations there are n — n , W{x ft ) — 1 intervals. 
This denominator is independent of the chronological order of the observed 
values. We can calculate the mean of the observed intervals up to a value x m 
so that n — n'W ( x m ) = 2. For this value of x m there arc only two observa¬ 
tions, i.e., only one interval. In that case no mean can be calculated. 

The numerator, the interval between the first and the last observation greater 
than x m will be n — 1, provided that the first and the last value in chronological 
order are greater than x m . But in general the first value greater than x m will 
be the (' k + l)th in chronological order. The first value greater than x m found 
in the reverse chronological order, will be the ( k ' + l)th. Let 'k + k' = Z, then 
the interval between the last and the first value greater than x m is n — 1 — Z. 
The mean observed interval is thus 

i T(x m ) = (w - 1 - Z)/(n - 1 - n'W(x m ])), 


(7) .W*-) 

This magnitude depends only on the chronological order of the first and the 
last value greater than x m . It is independent of the chronological order of all 
other observations. Even in the case l = 0 this value differs from the theoretical 
value (3). The observed value surpasses the theoretical value, even if the 
frequency 'W(x m ) is identical with the probability W(x). 

In the general case, l > 0, this difference is a function of Z. The number Z 
depends upon the times at which the observations begin and cease; but it is 
not a characteristic of the chronological order. As a result of these disad¬ 
vantages of formula (7) we prefer to introduce other definitions, in which the 
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chronological order does not enter. These definitions have an added advantage 
in that they are constructed in a manner analogous to the theoretical formula. 
The observed value which corresponds to (3) is 


(8) 

T(Xm) = n-n’Wixj' 

or 


(9) 

'T(x m ) = n/(n - m). 


But this definition of the observed return period is not the only one which 
corresponds to (3). Starting with the serial number m, in a descending scale, 
Fuller [6] puts 

(80 "TixJ = 

m 

According to this definition, the return period of the mth value from below is 
(9') "l\x m ) « n/(n - m + 1). 

TABLE I 

Two definitions of the observed return periods 


observed 

serial number 

serial number 

exceedance interval 

recurrence interval 

variable 

from below 

from above 

formula (9) 

formula (9') 

Xi 

1 

n 

n/(n — 1) 

1 

x 2 

2 

ft — 1 

n/(n — 2) 

n/{n — 1) 

Xm, 

m 

n — m + 1 

n/(n — m) 

n/(n — m + 1) 

X n -l 

n — 1 

2 

n/l 

n/2 

X n 

n 

1 

— 

ft/1 


This observed return period corresponds to the theoretical return period (30* 
The difference between (9) and (90 results from the fact that the relation (2) 
between the observed cumulative frequencies f W(x m ) and 'P( m x) differs from the 
relation (20 between the probabilities W(x) and T(x). The two definitions 
of the observed return periods are related by 

(10) n T(x^) = T(s m ) < T(* m+1 ). 

From a purely logical standpoint the first definition is as justifiable as the 
second one. Both are used in hydraulics. In order to avoid confusion between 
formulas (9) and (90 Horton [16] calls 'T(x m ) the exceedance interval, i.e., “the 
average interval at which an event of given magnitude is exceeded,” whereas 
he defines ”T(x m ), the recurrence interval as “the average interval of occurrence 
of values equalling or exceeding a given magnitude.” Of course, the exceedance 
interval surpasses the recurrence interval. Since both observed intervals cor¬ 
respond to a common theoretical return period we designate both of them as 
observed return periods. 

The difference between formulas (9) and (90 is made clear in Table I. 



168 


E. J. GUMBEL 


Each of the definitions (9) and (9') and the theoretical expression T(x) has 
different properties. For the lowest observation 

n f W(x 1 ) = 1; n'P( n x ) - n. 

Therefore 


T(*0 « 1 + ; ”T(xi) = 1, 

n — 1 

whereas for an unlimited distribution lim 7 T (z) = 1. 

X-*—00 

If the number of observations is sufficiently large the numerical differences 
between the two observed periods are rather small, except for very large values 
of the variable. For the last observation 

n'W(x n ) = n; n'P( ix) = 1. 

Therefore the return period 'T(x n ) for the last observation does not exist. Ac¬ 
cording to the second definition the return period for the last value is equal to 
the total number of observations. But in general there is only one observation 
of the last value. 

The preference given formula (9) over (9') corresponds with the preference 
given to W(x) over P(x) when comparing the theoretical with the observed 
values. Therefore it is natural to count m from below. Since both definitions 
are equally applicable and since they lead to different results for large values of 
the variable, one should not calculate the* return period for a small number of 
observations. 

The observed return periods (9) and (9') differ from the theoretical return 
period (3) in the same way that the frequencies ’W(x m ) or 'P( m x ) differ from the 
probabilities W(x) or P(: r). The chronological order enters neither into formula 
(7) nor into (9) or (O'). We need not take it into consideration, since the 
theoretical return period is obtained from the probability and the observed 
return period from the cumulative histogram. Therefore the usual statistical 
methods can be used for making the comparison between observed and theoreti¬ 
cal return periods. 

The return period is a statistical function like the distribution w{x) or the 
probability W(x). No formula for T(x) that contradicts the properties of w(x) 
can be accepted. The return period T(x) will contain the same number of inde¬ 
pendent constants as the distribution w(x). Consequently the fit of the theo¬ 
retical curve T(x) to the observations f T(x m ) or "T{x m ) cannot be improved by 
introducing a new constant without also changing the distribution w(x). The 
theoretical curve x = f(T) will fit the observed curves ( x m , f T{x m )) and 
(x m , n T(x m )) in a way that depends upon the fit of W(x) and P{x) to f W(x m ) 
and 'P( m x). 

Let us suppose that w(x) contains k constants; that they are determined by the 
method of moments which conserves the arithmetic mean z, the mean of the 
squares x* etc. of the observed distribution. For the return period these mo- 
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ments have a meaning. Let us consider for the sake of simplicity a positive 
variable. The Arth moment Af* 


M k 


J x k dW (x) 


= - r x k d(l - W(x)) 

Jo 

* k C (1 “ W(x))x k ~ l dx 

is according to (3) 

r°° x k ~ l ^ 

(id Mk = k l ~T(xy ’ 

whence for k = 1 and k = 2 


(11') 



E(x 2 ) = 2 r 

Jo 


xdx 

T(xY 


For a given distribution containing two constants, the method of moments con¬ 
serves the area and the center of gravity of the reciprocal of the return period. 
Even if the method of methods gives the best determination of the constants, 
for the distribution, it need not give the best determination for the return 
period. But if the observed return periods were used for the determination of 
the constants we would get two sets, since there are two observed curves having 
equal validity, but different values for large x. We will get one and only one 
set if the constants are calculated from the observed distribution, for here the 
difference between f T(x m ) and f 'T(x m ) does not matter. The fact that we do 
not take the constants from the observed return periods, but from another 
statistical function, might be a cause for deviations between the observed and 
the theoretical return periods. 

Once the constants have been found, we compare the observed curves 
(x m , f T(x m )) and (x m , " T(x m )) with the theoretical curve x = f(T). To avoid 
discontinuity the observed return period will be established for all values of x m 
arranged in increasing order. 

If the observed return periods for small values of x are systematically smaller 
(greater) than the theoretical period, it Is reasonable to conclude that there 
exists an attraction (repulsion) for small values of the variable and a repulsion 
(attraction) for the large values. But it must be remembered that the observed 
values have different weights in that the return periods for small values of x are 
based on many observations. This number diminishes as x increases. The last 
observed return period is based only on two observations. Therefore the di¬ 
vergence between theory and observation will increase with the variable. With 
this precaution the criterion of the return period suggests one cause of difference 
between theory and observation. In order to apply this method to the largest 
values we must first establish the corresponding distribution. 



170 


E. J. GUMBEL 


2. Theory of the largest value. Let x be a statistical variable unlimited to 
the right having the distribution w(x). Among the N observed values, one will 
be larger than the others. We wish to determine its theoretical value. 

According to the principle of multiplication the probability 2B#(a;) that N 
values are inferior to x is 

(12) »*(*) - W"(z). 

This is the probability of x being the largest value. The largest value is a new 
statistical variable which possesses a mode, a mean u , a standard deviation s 
and higher moments. To get the mean the distribution tou(x) of the largest 
value is needed. From (12) by differentiation 

(13) to K (x) = NW"-\x)w(x). 

The mode will be the solution of 


(130 


N — 1 
W(x) 


w(x) + 


w'(x) 

w(x) 


= 0 . 


For a given initial distribution w(x) and for small N we have to solve this equa¬ 
tion. But the mean and the moments cannot be obtained in a general way by 
the use of the exact distribution (13). However we can reach general solutions 
if N is large, provided we limit ourselves to certain classes of initial distributions. 
We have studied this problem in previous publications [11 13]. For our present 
purpose it is sufficient to give the results in a form due to R. von Mises [18]. 

We define a large value u of the variable x by 


(14) N{\ - W(u)) - 1. 

This means that the expected number of observations equal to or greater than u 
is one. Equation (14) is but another form of definition (3). The mean number 
of trials is used in (3) whereas the original variable x is used in (14). 

The probability a du that a value greater than u will be contained between u 
and u + du is given by 


(15) 


_ w(u) 


Obviously a and u are functions of N and the constants in the initial distri¬ 
bution w(x). There are two limiting forms of the probability (12) 


lim W N {x) - F(x); lim W N (x) - ©(*). 

N-*ao 


If 


(16) 

lim au = fc > 0, 

we obtain 


(17) 

F(x) = e“ Wl) \ 
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This probability function was first established by Fr4chet {5]. If 


r d 
lim 3- 
u—m dU 




we obtain 


®(x) = e 




This probability function is due to R. A. Fisher [4]. Let us consider the first 
limit. The initial distributions which lead to it belong to the Pareto type. 
For this distribution 

w(») “ " 1 “ ^ 1 

and condition (16) holds; for any value of x 

xw(x) _ , 

1 - W\x) * 

The distribution f{x) of the largest value, which corresponds to (17), is 




„-(«/»)* 


The mode x N of the largest value is the solution of 


hence 


(fc + 1) In 


Jfc “b 1 




According to the definition (14) the mode of the largest value will increase 
with N. For a finite number of observations, which is always the case, the 
mode will be limited. But the moments of order k or higher will not exist. 
For k < 1, no moment will exist. For k < 2, only the first moment, the mean, 
exists, and so on. 

Let us consider now the second limit (19). The initial distributions which 
lead to it belong to the exponential type . For this distribution [14] 


w(x) = 

and for any value of x 


x £ 0, 


- W(x)' 
w(x) / 


0, 
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which means that condition (18) is fulfilled. Most of the distributions used in 
statistics belong to this type. According to (19) the distribution of the largest 
value is 

(22) »(x) = 

If we introduce a reduced variable y without dimension by the linear trans¬ 
formation 


(23) y = a(x — u ), 

we get the reduced probability $b(y) 

*<y) - «(*) 

—•~~v 

= e 


(24) 


The numerical values of this function, calculated by means of Becker's tables [1], 
are given in Table II, col. 1 and 2. The reduced distribution 

(25) t(y) - c~~ v ~*~ v , 


makes clear the meaning of u : the distribution has one and only one maximum 
which occurs for the reduced value y = 0. Therefore u is the mode of the 
largest value for a given set of N observations. For an initial distribution w(x) 
satisfying (18), and for large N, definition (3) of the return period as a function 
of x becomes identical with relation (14) which involves the number of observa¬ 
tions N and the corresponding most probable value u. 

We wish to decide which distribution of the largest value is to be used to 
represent the given observations. This decision depends, according to (16) and 
(18), on the nature of the initial distribution at the extreme values of the 
variable. If the law of the observed initial variable is known, a precise answer 
can be given. But generally speaking, a distribution chosen to represent given 
observations is nothing but an interpolation formula. Formulas having different 
analytical properties may all give satisfactory results. One might fulfill condi¬ 
tion (16), and another (18). The conditions apply to the differential coefficient, 
whereas the initial observations are always discontinuous. Therefore they will 
not enable us to decide which, if any, of the conditions is met. For extreme 
values of the variable x the observed differences are large and nonuniform, and 
there is therefore no way to replace the differentiation by a finite difference. 
Consequently we have to use the observations of the largest values to control 
the two competing theories and not the conditions. The fact that distribution 
(20) has higher moments only under certain conditions, is a strong practical 
argument in favor of distribution (22). Therefore the following development 
will be based on this distribution. 

It can be shown that the mean error 6 of distribution (22) is related to the 
constant a by 

(26) 6 = 0.98/a. 

Therefore the constant u is the most probable largest value for N observations 
and 1/a a multiple of the mean error. 
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TABLE III 

Observed return periods 
Rhdne, Lyon (France) (1826-1936) 


Flood 

discharge 

Serial 

number 

m 

Return period 
log 'T(*„) 

899 

i 

.004 

1172 

2 

.008 

1231 

3 

.012 

1272 

4 

.016 

1272 

5 

.020 

1432 

6 

.024 

1432 

7 

.028 

1439 

8 

.032 

1444 

9 

.037 

1502 

10 

.041 

1541 

11 

.045 


12 

.050 

1639 

13 

.054 

1706 

14 

.058 

1780 

15 

.063 

1829 

16 

.068 

1850 

17 

.072 

1857 

18 

.077 

1913 

19 

.081 

1913 

20 

.086 

1934 

21 

.091 

1955 

22 

.096 

1992 

23 

.101 

1992 

24 

.106 

2006 

25 

.111 

2006 

26 

.116 

2013 

27 

.121 

2050 

28 

.126 

2050 

29 

.131 

2072 

30 

.137 

2094 

31 

.142 

2101 

32 

.148 

2115 

33 

.153 

2145 

34 

.159 

2145 

35 

.164 

2153 

36 

.170 


Flood 

discharge 

Xm 

Serial 

number 

m 

Return 
period 
log *T(Xm) 

2475 

57 

.313 

2475 

58 

.321 

2475 

59 

.329 

2491 

60 

.338 

2514 

61 

.346 

2514 

62 

.355 

2514 

63 

.364 

2514 

64 

.373 

2538 

65 

.382 

2554 

66 

.392 

2586 

67 

.402 

2594 

68 

.412 

2594 

69 

.422 

2594 

70 

.432 

2602 

71 

.443 

2626 

72 

.454 

2627 

73 

.465 

2643 

74 

.477 

2675 

75 

.489 

2675 

76 

.501 

2773 

77 

.514 

2773 

78 

.527 

2773 

79 

.640 

2839 


.554 

2856 


.568 

2881 


.583 

2881 

83 

.598 

2965 

84 

.614 


85 

.630 


86 

.647 


87 

.665 


88 

.684 


89 

.703 

3126 


.723 

3179 

91 

.744 

3214 

92 

.766 
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TABLE UI—Concluded 
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The substitution of the numerical values leads to 


(30') * = 0.77970y - 0.45005. 

Conversely, 

(31) y « 1.28255^ + 0.57722. 

The value (32) v = s/a } the coefficient of variation, is related to the product 
au . By (27) au = au — c and by (28) 

(33) au = — y=• “ — c. 

V 6 v 


Therefore the numerical value of au can also be considered as a characteristic 
of an observed distribution of largest values. 

For the two constants we calculate for the observed distribution of largest 
values the two first moments 


(34) 

and 


n m~l 


(35) u*= 1 -'£xl. 

n rn-1 

To get the observed standard deviation we use the Gaussian formula 

(36) s = j/(l + (t? - u). 

According to (28) and (27) 

(37) - = 0.7796968s, 

a 

and 


(38) 


_ 0.5772157 

u ~ u —- 


a 

These formulas give the two constants in the distribution of largest values. 


3. Flood flows intezpreted as largest values. We will now apply the theory 
of largest values to flood flows. Let us consider the daily flow as a statistical 
variable, unlimited to the right. This idea is not new. The formulas proposed 
by Fuller [7], Hazen [15], and numerous other authors all incorporate this 
assumption. Gibrat [9] supposes that the daily flows vary according to Galton’s 
distribution. Instead of postulating a specific formula for the distribution of 
flows we shall only suppose that it belongs to the usual exponential type, which 
means that condition (18) is fulfilled. 

We define a flood as being the largest value of the N = 365 daily flows. The 
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flood flows are therefore the largest values of flows. This commonplace implies 
the distinction between floods and inundations. For each year there exists one 
or more floods of the same magnitude, but there might exist several different 
inundations or none at all. If there are several inundations in a year the 
greatest one will be a flood; but a flood need not to be an inundation: even a 
dry year has a flood. We limit ourselves to floods, assume that N ~ 365 is a 
large number, and represent the distribution of annual floods by the distribution 
(22) of largest values. 

There have been objections to the concept that the daily flow is an unlimited 
variable. Horton [16] believes that this implies the absurd idea of unlimited 
floods. This opinion is shared by Slade [20], who claims that there is a definite 
upper limit to the magnitude of the floods for a given stream. The theory of 
largest values confirms only partially Horton’s opinion. If we should choose 
distribution (20), the most probable annual flood will be limited. For this 
distribution, however, it might happen that the mean annual flood has no 
meaning. To avoid this we have chosen distribution (22), for which the mean 
annual flood and all the moments will be finite. A further justification of the 
use of (22) might be derived from the fact that Galton’s distribution belongs to 
the exponential type. As a final argument, numerical calculations show that 
formula (22) gives a better fit to the observed distributions of flows. 

The variable x is the annual flood flow measured in cubic meters or cubic 
feet per second. The mean u is the annual mean flood, whereas u is the most 
probable annual flood. The value $ is the standard deviation of the distribu¬ 
tion of annual floods. Finally y is called the reduced flood. 

The distribution (22) possesses the properties of the observed distribution of 
flood flows. It is asymmetrical; rising rather quickly but falling rather slowly. 
The modal value is to the left of the mean (see Fig. 3). 

To apply the theory of return periods let us consider the event of the highest 
annual discharge being greater than x . We have to replace in formula (3) the 
general probability W{x) by the probability of flood discharges (19). The 
number of observations n is the number of yeai£ for which observations exist. 

To use formula (3) we have to suppose that the intervals between the suc¬ 
cessive floods are all equal to one year. This assumption conforms more or less 
to the seasonal nature of floods. 

The return period of a flood greater than x 

(39) T(x) - -- 

1 — e 

is the arithmetic mean of the intervals between two years, which have a flood 
discharge greater than x; the discharges for the intervening years are all less 
than x. Therefore T(x) is the mean of the number of years for which x will be 
surpassed once. Formula (39) gives the meaning of u from the standpoint of 
the return period. For y = 0 

e 

e — 1' 


T(u) = 
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The return period T{u) of the most probable annual flood is 1.58108 years. In 
other words, the constant u is the flood discharge with return period 

(40) log T(u) - 0.19920 

where log signifies the common logarithm. The return period of the mean 
annual flood is by (27) and (39) equal to 2.32762 years. 

Let us now consider the relation between the flood discharge x and its return 
period for small and large values of x. To small values of x correspond large 
negative values of y and therefore return periods T approximating 1. The 
distribution (25) of the largest values being unlimited, the flood discharge con¬ 
sidered as a function of log T will by (6) increase rapidly at first. To large 
values of x correspond large values of y and T(x). If we introduce the natural 
logarithm, (39) gives 

-lD 0 ” W)) = C * 

For large values of x, viz., T{x) ^ 10, it is sufficiently accurate to use 


T(x) 

so that 

(41) y = In T(x). 

If the common logarithm is used, 

(42) log T(x) = 0.434294a(z — u ). 

The logarithm of the mean number of years for which the flood discharge will 
once be exceeded, converges towards a linear function of x . This property of 
the distribution of largest values was established by M. Coutagne [2]. Let us 
write 

(43) x — u H-log T(x). 

a 


Then 1/a can be considered as a measure of the increase of a flood discharge 
with respect to the logarithm of time. 

According to the general formulas (6) and (42) the shape of the return period 
as a function of the flood discharge x is as follows: at the beginning i.e., for small 
flood discharge, the return periods are close to 1 and increase very slowly. At 
the end, i.e., for large flood discharges, the logarithm of the return period con¬ 
verges to a linear function of x. 

Another form of (43) is 


(44) 


x 

u 


1 + 


2.30258 

au 


log T(x). 


The ratio of the flood discharge which will be exceeded in the mean once in T 
years to the modal annual flood converges to a linear function of the logarithm 
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of the return period. The constant l/au of dimension zero depends, by (33), 
on the coefficient of variation. Its value is a characteristic of the stream. If 
we introduce the arithmetic mean U and the standard deviation a we obtain 
by (42), (27), and (28) 

x = U - 0.45005s + (0.77970) (2.30258)s log T(x). 

Therefore, approximately, 

(45) % « 1 - ~ v + 1.796s log T(x). 

u zu 

The right hand member of this linear equation contains only one constant, the 
coefficient of variation of the floods. Finally by (42) and (31) 


(46) log T(x) = 0.25068 + 0.55700?-- 

8 


There is still another way of interpreting these asymptotic formulas. 
T(2x) be the return period of the value 2x, then by (43) 


Let 


2x = u + 


In T(2x) 


therefore 


ocu + In T{2x) 
aw + In T(x) 9 


and finally 

(47) T(2x) = T\x)e au . 


The return period of a flood of magnitude 2x is equal to the square of the 
return period of x multiplied by a factor which depends only upon the coefficient 
of variation. 

All these asymptotic formulas are good approximations only for return periods 
above ten years, which means according to Table II, y ^ 2.25 or according 
to (23), (30) and (31) x £ U + 1.3s. The corresponding value of the flood 
probability is by (3) 8B(a;) ^ 0.9. The consequences of (41) can be applied to 
only 10% of the observations, i.e. to the large flood discharges. Their observed 
return periods are based on a few observations and may therefore differ con¬ 
siderably from the theoretical values. In spite of the above restrictions the 
linear formula (43) has a meaning for values of T equal to or greater than unity. 
We now ask: How will the most probable largest value increase with the number 
of observations? This number of years can again be called T. The answer to 
the above question requires the solution of (13') where the distribution (25) of 
largest values t)(y) must be introduced as the initial distribution w(x). 

From (24) 


- 1 + C* * 0 , 


T - 1 
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or 

Te* - 1, 


which is identical with (41). For T = 1 the most probable annual flood is of 
course u . Therefore the relation (41), valid for T ^ 1, means: The most prob¬ 
able flood u(T) to be reached within T years is a linear function of the logarithm 
of T 


(410 


«m - u + 2 - 302581 °?- 1 ’ 

a 


The constant 1/a Is the slope of this straight line. The results (41- 46) are 
related to Fuller's well-known formula [6], This author, the first to investigate 
flood flows systematically, proposed a linear relation between the logarithm of 
the return period and the arithmetic mean of the flood discharges greater than 
the mth value (m taken from above). A similar empirical formula has been 
.stated by Lane [7] and has been applied by Saville [19]. The similarities and 
differences between these interpolation formulas and our theory can be stated 
in the following way: If we start from the theory of largest values we reach 
these formulas as asymptotic expressions for the return period of large floods. 
Considered this way, our theory gives a certain justification to Fuller's hypothe¬ 
sis. But Fuller's and similar formulas were intended to apply to all flood 
discharges. Now, the distribution of the flood discharges (4) corresponding to 
these return periods does not fit the observations. It can be shown that these 
formulas involve the assumption of a simple exponential distribution <p(x) for 
the flood discharges 

(48) r(x) = 

U ~ € 

and the existence of a lower limit e of the flood discharges given by t = u — 8. 
In Fuller's formula all flood discharges must be greater than 2/3 of the mean 
annual flood. The density of probability always diminishes with increasing 
magnitude of the flood. This neglects the ascending branch (about one third) 
of the distribution of floods (see Fig. 3) and is incompatible with the observed 
facts. We therefore prefer our formula which takes account of the total varia¬ 
tion, but we do not minimize the importance of Fuller's work which has led to 
much valuable research. 

Formula (39) gives the theoretical return periods T{x) as a function of the 
reduced flood discharge y } and holds for the entire range of observations. The 
general numerical values are given in Table II, cols. 1 and 3. For a given stream, 
the return period of a flood discharge greater than x depends by (23) upon the 
two constants a and u. If these values have been calculated by (37) and (38) 
the theoretical flood discharge x corresponding to T(x ) is obtained by the 
linear transformation 


( 49 ) 


x = u + y/a . 
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The asymptotic formula (42) suggests the coordination of the flood discharges 
x and the logarithm of the return periods. 

4. Rhdne and Mississippi Rivers. We think that our system of formulas is 
simple, logically consistent and free of artificial assumptions. Now it remains 
to be shown that the arithmetic involved is simple and that the results fit the 
observations. For the Rhdne we shall analyze the observed cumulative fre¬ 
quency, the distribution, and the return periods. For the Mississippi River 
we shall limit ourselves to the return periods. 

For each year we choose the maximum of the daily discharges (we do not use 
momentary peaks). The 111 values x m for the Rhdne 1826 -1936 published by 
Coutagne [3] and arranged in order of increasing magnitude are given in Table III 
(col. 1). The supposition that the intervals between consecutive floods are all 
equal to one year is not always true. Only 77 of the 111 floods occurred between 
October and March, whereas 34 were scattered throughout the year. But the 


TABLE IV 
Calculation of constants 


Stream observation station... 


Rhdne Lyon 

Mississippi River 



(France) 

Vicksburg (Miss.) 



1826-1936 

1890-1939 

Number of observations. 

.... n 

111 

50 

Annual mean flood. 

. . . u 

2,493.5 

1,355.6 

Mean squared flood. 

• . . 

6,707,555.0 

1,951,828.8 

Standard deviation. 

. S 

703.1 

341.3 

Constant. 

A/a 

548.2 

266.1 

Most probable annual flood... 

.... u 

2,177.0 

1,201.9 


differences in the lengths of the intervals compensate each other. The second 
column of Table III contains the serial number m. According to (9) we calcu¬ 
late for the rath observed flood discharge x m , taken in ascending magnitude, 
the logarithm of the observed return period log n/(n — ra) (col. 3), where n = 111 
and ra = 1 , 2, ••* , 110, and obtain the exceedance intervals. The other 
observed curve, the recurrence interval, is obtained by (10) through the coor¬ 
dination of Xm+i and log n/(n — ra). Both curves are plotted in Fig. 1. The 
recurrence and exceedance intervals differ for the large flood discharges. The 
observed flood discharges arranged in increasing magnitude are plotted in the 
cumulative histogram, Fig. 2. 

To compare these observations with our theory, we calculate the two con¬ 
stants 1/a and u according to the formulas (34)~(38). The values Xx m and 
are given at the end of Table III. Division by n = 111 gives the mean 
flood u and the mean squared flood u 2 (Table IV). The Gaussian correction 
being 1 + 1/110 we obtain from formula (36) the standard deviation s (Table IV) 











TABLE V 


Observed and theoretical distributions of flood discharges 
Rhdne 


Reduced 

variable 

y 

Variable 

X 

Midpoints 

. Ax 
x 4* — 

T 2 

Observed 

distribution 

111A'S&(*) 

Theoretical 
distribution 
111A©(») 

Cumulative 

frequency 

111©(») 

-2.75 

-2.50 

670 

807 

i 


0.00 

-2.25 

944 



0.01 

0.01 

-2.00 


1081 

1 

0.34 

0.07 

-1.75 

1218 



1.19 

0.35 

-1.50 


1355 

7 

3.03 

1.26 

-1.25 

1492 



6.07 

3.38 

-1.00 


1629 

5 

9.98 

7.33 

-0.75 

1766 



14.02 

13.36 

-0.50 


1903 

13 

17.38 

21.35 

-0.25 

2040 



19.49 

30.74 

0.00 


2177 

21 

20.21 

40.84 

0.25 

2314 



19.68 

50.95 

0.60 


2451 

19 

18.26 

60.52 

0.75 

2588 



16.31 

69.21 

1.00 

* 

2725 

14 

14.14 

76.83 

1.25 

2862 



11.97 

83.35 

1.50 


2999 

9 

9.94 

88.80 

1.75 

3136 



8.15 

93.29 

2.00 


3273 

8 

6.61 

96.95 

2.25 

3410 



5.30 

99.90 

2.50 

t 

3547 

6 

4.23 

102.25 

2.75 

3686 



3.45 

104.13 

3.00 


3822 

4 

2.65 

105.70 

3.25 

3959 



2.00 

106.78 

3.50 


4096 

2 

1.64 

107.70 

3.75 

4233 



1.28 

108.42 

4.00 


4370 

1 

1.01 

108.98 

4.25 

4507 



0.79 

109.43 

4.50 


4644 

0 

0.61 

109.77 

4.75 

4781 


i 

0.48 

110.04 

5.00 


4918 


0.38 

110.25 

5.25 

5055 


: 

0.30 

110.42 

5.50 


5192 


0.23 

110.55 

5.75 

5329 



0.18 

110.65 

6.00 


5466 


0.27 

110.73 




ill 

111.00 
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and finally from (37) and (38) the constant 1/a and the most probable annual 
flood u. From the numerical values in Table IV the linear transformation (49) 
for the Rhdne is 


* - 2177.03 + 548.19;/. 
TABLE VI 


Observed return periods 

Mississippi River, Vicksburg, (Miss.) (1890-1939) 


Flood 

discharge 

Xm 

Serial 

number 

m 

Return period 
log' T(Xm) 

Flood 

discharge 

Xm 

Serial 

number 

m 

Return 
period 
log' T(xJ 

760 

1 

0.0088 

1357 

26 

.3188 

866 

2 

.0178 

1457 

27 

.3273 

870 

3 

.0269 

1397 

28 

.3566 

912 

4 

.0362 

1397 

29 

.3768 

923 

5 

.0458 

1402 

30 

.3980 

946 

6 

.0555 

1406 

31 

.4202 

990 

7 

.0655 

1410 

32 

.4437 

994 

8 

.0758 

1410 

33 

.4686 

1018 

9 

.0862 

1426 

34 

.4949 

1021 

10 

.0969 

1453 

35 

.5229 

1043 

11 

.1079 

1475 

36 

.5529 

1067 

12 

.1192 

1480 

37 

.5851 

1060 

13 

.1308 

1516 

38 

.6198 

1073 

14 

.1427 

1516 

39 

.6576 

1186 

15 

.1549 

1536 

40 

.6990 

1190 

16 

.1675 

1578 

41 

.7448 

1194 

17 

.1805 

1681 

42 

.7959 

1212 

18 

.1939 

1721 

43 

.8539 

1230 

19 

.2076 

1813 

44 

.9208 

1260 

20 

.2219 

1822 

45 

1.0000 

1286 

21 

.2366 

1893 

46 

1.0969 

1306 

22 

.2518 

1893 

47 

1.2219 

1332 

23 

.2676 

2040 

48 

1.3980 

1342 

24 

.2840 

2056 

49 

1.6990 

1353 

25 

.3011 

2334 

50 



2x m - 67,780. Sii - 97,691,440. 


This leads to the determination of the theoretical flood discharges. The theo¬ 
retical return periods log T(x) are given in Table II, col. 3 as a function of the 
reduced variable y and of x (col. 4). The discharges x obtained by letting 
y take on the values —2.76 to 6.00 in the linear transformation, are given in. 
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Table V, cols. 2 and 3 and plotted in Fig. 1. The distances Ax used in the 
calculations of the theoretical discharges are l/4a = 137.05. 

Along the abscissa are plotted the logarithm of the return periods and the 
return periods in years; along the ordinate are plotted the corresponding flood 
discharges and the modal annual flood u . The straight line from the point (a, 0) 
to the asymptote gives the most probable flood as a function of time. The 
theoretical curve corresponds quite closely with the general course of the ob¬ 
servations. For small floods the theoretical return periods are practically iden- 



Fig. 1. Rh6ne at Lyon (France) 1826-1936 
Observations Table III: Recurrence intervals, Exceedance intervals, 

•-•; Return periods, -; Theory Table II, cols. 3 and 4: Extrapolation,-. 


tical with the observed values. But for the very large floods the theoretical 
curve surpassed both the exceedance and recurrence intervals. 

The observed cumulative histogram is shown in Fig. 2. We calculate from 
Table II, col. 2, the frequencies lllS3S(x) (Table V, col. 6). These theoretical 
values ( x , 1112S(x)) are also plotted in Fig. 2. The agreement between theory 
and observations is very good. 

For the comparison of the observed and theoretical distributions of the flood 
discharges we use what might be called the natural classification. For the 
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observations, the length of the class intervals and the beginning of the firet class 
interval are arbitrary. In order to obtain the observed distribution of the flood 
discharges, it is natural to use the theoretical class intervals set forth in Table V, 
col. 2. The data of the third column can be interpreted as the midpoints of the 
class intervals given in col. 2. The frequencies for these class intervals are ob- 



Fig. 2. Cumulative Frequency of the Flood Discharges. RhCne, Lyon (France) 

1826-1936 

Observations Table III cols. 1 and 2, Theory Table V cols. 2, 3 and 6, / 


taincd from Table III, and are given in Table V, col. 4. The observed distribu¬ 
tion is shown in Fig. 3. To obtain the corresponding theoretical distribution we 
calculate from Table V, cq). 6 , the difference between two cumulative frequencies 
disjoined by one, i.e., we pair consecutively the first and third, the second and 
fourth items and so on. This theoretical distribution given in col. 5 and the 
observed distribution are based on class intervals of the same length. Fig. 3 
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shows that the theoretical distribution A8B(x) of the largest values agrees in a 
satisfactory way with the observed distribution A'ffi(z) of the flood discharges. 
Table VI, col. 1, gives the corrected 1 flood discharges x m , measured in units of 
1000 cubic feet per second, for the Mississippi River at Vicksburg (1890-1939), 
(n = 50), arranged according to increasing magnitude; col. 2 gives the serial 
number m. We calculate the logarithm of the observed return periods log 
n/(n — m), (col. 3). The observations (x* , log 'T(x m )) and (x m+l , log 'T(x m )) 
are plotted in Fig. 4. The constants obtained by formulas (34)-(38) are shown 



Fig. 3. Distribution of the Flood Discharges. RhOne, Lton (France) 1826-1036 
Observations Table V cols. 2, 3 and 4, [”1; Theory Table V cols. 2, 3 and 6, C 

in Table IV. By (49) the theoretical floods x corresponding to the return 
periods T(x) presented in Table II, col. 3, are 

x = 1201.98 + 266.142/. 

These floods are given in Table II, col. 5. The class interval used is 

l/4o = 66.5. 

* These data have been put at my disposal through the courtesy of Mr. A. E. Brandt of 
the U. S. Department of Agriculture. 
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The theoretical curve (x, log T(x))> plotted in Fig. 4, agrees in a very satisfactory 
way with the observations. For the large floods the theoretical return periods 
are between the exceedance and recurrence intervals. 

The calculations of the theoretical return periods for other streams, e.g. the 
Columbia, Connecticut, Cumberland, Rhine, and Tennessee Rivers, for which 
reliable observations exist for more than 60 years, also show a good agreement 
with the observations. The goodness of fit diminishes for streams for which 
the number of observations is smaller and for which the data are not very 
reliable. 



Fig. 4. Mississippi River at Vicksburg, (Miss.) 1890-1939 

Observations Table VI: Recurrence intervals, Exceedance intervals, 

•-•; Return periods, ——; Theory Table II, cols. 3 and 5; Extrapolation,-. 

5. Summary and conclusions. In order to apply any theory we have to sup¬ 
pose that the data are homogeneous, i.e. that no systematical change of climate 
and no important change in the basin have occurred within the observation 
period and that no such changes will take place in the period for which extra¬ 
polations are made. It is only under these obvious conditions that forecasts 
can be made. 

The theoretical return period T(z), the mean number of years between two 
annual flood discharges greater than or equal to x, is a statistical function such 
as the distribution w(x) or the probabilities W(x) and P(x). There are two 
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sets of observed values corresponding to the theoretical set. The exceedance* 
interval f T(x m ) formula (9), and the recurrence interval "T{x m ) formula (9'); 
x m being the mth flood discharge, where m is counted from below. As any 
theory must include both notions, no separate theory for exceedance or recur¬ 
rence intervals is possible. 

The return period T(x) of a flood discharge x is found by formula (39). For 
large values of x the flood discharge converges toward a linear function (42) of 
the logarithm of the return period. This is the scientific basis of Fuller's em¬ 
pirical formula. The two constants of our formula u and 1/a, are, respectively, 
the most probable annual flood discharge and a multiple of the standard devia¬ 
tion (28). Their values depend upon the drainage basin and known geological 
and meteorological factors. It is beyond our present task to consider the influ¬ 
ence of these factors. Our method can be summarized by the following rules: 

1) For each year find the maximum daily discharge x m (do not use momentary 
peaks) and arrange these n data in increasing magnitudes. 

2) Calculate for each discharge x m (m * 1, 2, • • • , n — 1), the values log 
f T(x m ) = log n — log (n — m) and plot the curves x m , log n/(n — m), and 
Xm+i , log n/(n — m). These are the observed exceedance and recurrence 
intervals. 

3) Calculate the annual mean flood u and the annual mean squared flood u\ 
determine according to (36)-(38) the standard deviation 

Vt 1 + 7r^i) ( “ s " w ' 8) 


and the two constants 


1/a = 0.77970s, 

. 0.57722 

u = u — --. 

a 

4) The theoretical flood discharges x corresponding to the logarithm of the 
return period T{x) given in Table II, col. 3, are obtained by the linear trans¬ 
formation 


x = u + y/a 

where y is taken from Table II, col. 1. Plot x as a function of log T{x). For 
large values of x and for extrapolation it is sufficient to use the linear asymptote 
obtained graphically. 

The linear part of the theoretical curve ( x f log T) permits of two interpreta¬ 
tions: First, T is the theoretical return period of a flood greater than or equal 
to x; second, x is the most probable flood to be reached within T years. The 
second interpretation holds for the straight line through the point (w, 0). 

The figures show a close agreement between observed and theoretical values. 
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The observed curvature of the return periods is brought out by the theoretical 
graph. 

The agreement between theory and observation is excellent for floods which 
correspond to reduced values of y ^ 3. For the two or three extreme floods, 
the return periods are based on a few observations and, consequently, the agree¬ 
ment is not very good. No theory can be verified by two or three observations. 
Generally speaking, the theory fits the observations as closely as could be ex¬ 
pected for such a complicated phenomenon. 

In order to make a further test of our results, we need a numerical measure 
for the weights to be given to the theoretical points. Therefore, for a given 
probability we must find the corresponding theoretical limits for the observed 
return periods. The theory of positional values will give these control curves. 
Since it was the purpose of this article to develop and make clear the basic 
method, we have refrained from introducing this subject. 

It is our claim that the calculus of probabilities and especially the theory of 
largest values, is an efficient tool for the solution of certain hydrological problems. 
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ON THE FOUNDATIONS OF PROBABILITY AND STATISTICS 1 

By R. von Mises 
Harvard University 

1. Introduction. The theory of probability and statistics which I have been 
upholding for more than twenty years originates in the conception that the only 
aim of such a theory is to give a description of certain observable phenomena, 
the so called mass phenomena and repetitive events, like games of chance or 
some specified attributes occurring in a large population. Describing means 
here, in the first place, to find out the relations which exist between sequences 
of events connected in some way, e.g. a sequence of single games and the sequence 
composed of sets of those games or between a sequence of direct observations 
and the so called inverse probability within the same field of observations. The 
theory is a mathematical one, like the mathematical theory of electricity, based 
on experience, but operating by means of mathematical processes, particularly 
the methods of analysis of real variables and theory of sets. 

We all know very well that in colloquial language the term probability or 
probable is very often used in cases which have nothing to do with mass phe¬ 
nomena or repetitive events. But I decline positively to apply the mathemati¬ 
cal theory to questions like this: What is the probability that Napoleon was a 
historical person rather than a solar myth? This question deals with an iso¬ 
lated fact which in no way can be considered as an element in a sequence of 
uniform repeated observations. We are all familiar with the fact that, e.g. the 
word energy is often used in every day language in a sense which does not 
conform to the notion of energy as adopted in mathematical physics. This 
does not impair the value of the precise definition of energy used in physics and 
on the other hand this definition is not intended to cover the entire field of daily 
application of the term Energy. 

We discard likewise the scholastic point of view displayed in a sentence of this 
kind: “. . . that both in its meaning and in the laws which it obeys, probability 
derives directly from intuition and is prior to objective experience.” This 
sentence is quoted from a mathematical paper printed in a mathematical journal 
of 1940. The same author continues calling probability a metaphysical problem 
and speaking of the difficulties “which must in the nature of things always be 
encountered when an attempt is made to give a mathematical or physical solu¬ 
tion to a metaphysical problem.” In my opinion the calculus of probability 
has nothing to do with metaphysics, at any rate not more than geometry or 
mechanics has. 

1 Address delivered on September 11, 1940 at a meeting of the Institute of Mathematical 
Statistics in Hanover, N. H. 
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On the other hand we claim that our theory, which serves to describe ob¬ 
servable facts, satisfies all reasonable requirements of logical consistency and is 
free from contradictions and obscurities of any kind. I am now going to outline 
the essential ideas of the theory as developed by me since 1919 and I shall have 
to refer as to the proof of its consistency to the recent work of A. H. Copeland, 
of J. Herzberg and of A. Wald. Then I will give some examples of application 
in order to show how the theory works and how it applies to actual problems in 
statistics. 

2. The notion of kollektiv. The basic notion upon which the theory is estab¬ 
lished is the concept of kollektiv . We consider an infinite sequence of experi¬ 
ments or observations every one of which supplies a definite result in the form 
of a number (or a group of numbers in the case of a kollektiv of more than one 
dimension). We shall designate briefly by X the sequence of results Xi , £ 2 , 
x 8 , • • • . In tossing a die we get for X an endless repetition of the integers one 
to six, x = 1 , 2 , • • • 6 . If we are interested in death probability, we observe a 
large group of healthy 40 year old men and mark a one for each individual sur¬ 
viving his 41st aniversary and a zero for each man who dies before, so that the 
sequence £ 1 , £ 2 , £ 3 , • • • consists of zeros and ones. In a certain sense the 
kollektiv corresponds to what is called a population in practical statistics. Ex¬ 
perience shows that in such sequences the relative frequency of the different 
results (one to six in the first of our examples, one and zero in the second) varies 
only slightly, if the number of experiments is large enough. We are therefore 
prompted to assume that in the kollektiv, i.e. in the theoretical model of the 
empirical sequences or populations, each frequency has a limiting value , if the 
number of elements increases endlessly. This limiting value of frequency is 
called, under certain conditions which I shall explain later, the “probability of 
the attribute in question within the kollektiv involved.” The set of all limiting 
frequencies within one kollektiv is called its distribution. 

Let me insist on the fact that in no case is a probability value attached to a 
single event by itself, but only to an event as much as it is the element of a well 
defined sequence. It happens often that one and the same fact can be considered 
as an element of different kollektivs. It may then be that different probability 
values can be ascribed to the same event. I shall give a striking example of this, 
which we encounter in the field of actual statistical problems, at the end of this 
lecture. 

The objection has been made: Since all empirical sequences are obviously 
finite sequences, why then assume infinite kollektivs? Our answer is that any 
straight line we encounter in reality has finite length, but geometry is based on 
the notion of infinite straight lines and uses e.g. the notion of parallels which 
has no sense, if we restrict ourselves to segments of finite lengths. Another 
objection, often repeated, reads that there is a contradiction between the exist¬ 
ence of a frequency limit and the so called Bernoulli theorem which states that 
sequences of any length showing a frequency say J can also occur in cases for 
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which the probability equals £. But it has been proved, in a rigorous way ex¬ 
cluding any doubt, that the two statements are compatible, even by explicit 
construction of infinite sequences fulfilling both conditions. I would evenclaim 
that the real meaning of the Bernoulli theorem is inaccessible to any probability 
theory that does not start with the frequency definition of probability. 

Now we are in the position to explain how our probability theory works. 
This sequence of zeros and ones 

(X) 1 0 1 | 0 0 1 | 1 0 0 | 0 1 1 | 1 1 0 | 0 1 1 | 0 1 0 | 1 1 1 ... 

may represent the outcomes of a game of chance. The ones show gains, the 
zeros losses for one of the two players. If we separate the terms of X into groups 
of three digits and replace each group by a single one or zero according to the 
majority of terms within the group, we get a new sequence 

C X ') 10 0 1110 1... 

which represents the gains and losses in sets of three games. Our task is now 
to compute the distribution, i.e. the limiting frequencies of zeros and ones in 
this new sequence X assuming the two frequencies in X are known. A sequence 
can formally be considered as a unique number like a decimal fraction with an 
infinite 1 number of digits. Then the transition from X to X' can be called a 
transformation of a number X' = T(X). As our sequences have to fulfill certain 
conditions Copeland calls the sequences A", X r admissible numbers. What I 
just quoted was of course a very special example of a transformation of a number. 
But we have to emphasize that all problems dealt with in probability theory, 
without any exception, have this unique form: The distribution or the limiting 
frequencies in certain sequences are given, other sequences are derived from the 
given ones by certain operations, and the distributions in these derived sequences 
have to be computed. In other w r ords: Probability theory is the study of trans¬ 
formations of admissible numbers , particularly the study of the change of distribu¬ 
tions implied by such transformations. 

We know four and only four, simple, i.e. irreducible transformations or four 
fundamental operations. They are called selection, mixing, partitioning and 
combination. By combining these basic processes we can settle all problems 
in probability theory. The formal, mathematical difficulties in carrying out the 
computation of the new distributions may become very serious in certain cases, 
particularly if we have to apply an infinite number of transformations (asymp¬ 
totic problems). But, in the clearly defined framew r ork of this theory no space 
is left for any metaphysical speculations, for ideas about sufficient reason or in¬ 
sufficient reason, for notions like degree of evidence or for a special kind of prob¬ 
ability logic and so on. And further no modification is needed for handling usual 
statistical problems: Terms like inverse probability, likelihood, confidence 
degrees, etc. are justified and admitted only as far as they are capable of being 
reduced to the basic notion of kollektiv and distribution within a kollektiv. I 
will give some more details to this point later. Meanwhile let me turn to a 
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general question which, in a certain way, is the crucial point in establishing the 
new probability theory. 

3 . Place selections and randomness. It is obvious that we have to restrict 

still further the notion of kollektiv or the field of sequences which can be con¬ 
sidered as the objects of a probability investigation. The successive outcomes 
of a game of chance differ very clearly from any regular sequence as defined by a 
simple arithmetical law, e.g. the regularly alternating sequence 0 10 1 
0 10 1 •••. A typical property which singles out the irregular or random 
sequences and which has to be reproduced in every probability theory is that, if 
p is the probability of encountering a one in the sequence, then p 2 is the prob¬ 
ability of two ones following each other immediately. Any probability theory has 
to introduce an axiom which enables us to deduce this theorem and others of a 
similar type. The question is only how to find a sufficiently general and con¬ 
sistent form for it. The procedure I haye chosen consists in using a special kind 
of transformation of a sequence, which I call a place selection. 

A place selection is defined by an infinite set of functions s n (xi , x %, • • • £„_]) 
where X\ , x*, , • • • arc the digits of an admissible number or a kollektiv and 

« n has one of the two values zero or one. Here s n = 1 means that the nth digit 
of the sequence is retained, s n = 0 means that it is discarded. The decision 
about retaining or discarding the nth elements depends as you see, only on the 
preceding values x \, x% , • • • x n ~\ , but not on x n or the following digits. Example 
of a place selection: 

s n =* 1, if £ n -i = 0 for prime numbers n, 
if :r n _i = 1 for n not prime, 

Si = 1, and s n =* 0 in all other cases. 

Experience shows that, if we apply such a place selection to the sequence X 
of outcomes of a game of chance, we get a new, selected sequence S(X) in which 
the frequencies of gains and losses are about the same as in X. This fact or 
the practical impossibility of a gambling system suggests the adoption of the 
following procedure in handling transformations of admissible numbers. 

First, if within a certain investigation the transformation applied to X is a 
place selection, we assume that the distribution in X ' = S(X) is the same as 
in X: distr S(X) = distr X. Second, if a general transformation T is applied 
to X, say X' T(X), then we examine whether the existence of a place selection 
S that changes the distribution in X' (so as to have distr S(X') & distr X') 
implies the existence of a place selection Si that would affect the distribution in 
X (so as to give distr £i(X) = distr X). If this is the case, we say that X / is 
a kollektiv, provided that the original sequence X was considered to be a kollek¬ 
tiv. Take e.g. for X the sequence resulting from tossing a die endlessly, and 
call pi , p %, • • • pe the limiting frequencies of the six possible outcomes 1,2, • • • 6. 
The transformation T may consist in replacing every 1 in the sequence X by a 
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2, every 3 by ft 4, and every 5 by a 6. The new sequence consists of only three 
different kinds of elements 2, 4, 6 and therefore its distribution includes only 
three values pi, pi , pi where evidently pi » pi + p* etc. Here it is almost 
obvious that if a place selection applied to X' changes the value of pi , the same 
selection if applied to X must change either pi or p %. So, if the original sequence 
X was considered as a kollektiv, X' has to be admitted too. 

Now the question arises whether this procedure is in itself consistent or 
whether it can lead to contradictions. We were concerned up to now with 
kollektivs the elements of which belong to a finite set of distinct numbers 
e \, e*, . • • e* and the distributions of which are therefore defined by k non¬ 
negative values Pi, Pi , • • • Pk with the sum 1. In this case it was pointed out 
by Wald and by Copeland that, if an arbitrary distribution and an arbitrary 
countable set 2 of place selections are given, there exists a continuum of se¬ 
quences every one of which has the given distribution, which is not affected by 
any place selection belonging to 2. Now it may be supposed that in a concrete 
problem a sequence X f is derived from a sequence X by a finite number of 
fundamental operations involving a finite set 2' of place selections. Another 
finite set 2" may consist of selections employed in establishing that certain 
sequences used in the derivation of X' are “combinable” ones. Finally an 
arbitrary countable set 2 of selections S may be assumed. According to our 
procedure we have shown that to any place selection S which affects the distribu¬ 
tion in X ' corresponds a certain Si which, when applied to X, changes the dis¬ 
tribution of X. All these Si corresponding to the elements S of 2 form a 
countable set 2i. Now the set 2 2 including 2', 2", 2i and also including all 
products of two of its own elements is a countable set too. What we use in 
computing the distribution of X' is only the fact that the given sequence X is 
unaffected by the selections that are elements of 2 2 . It follows from the above 
quoted results that we can substitute for X a numerically specified sequence 
and carry out all operations upon this specified sequence. So it is proved that 
no contradiction can arise in computing the final probability according to out 
conception. 

I cannot enter here into a discussion of the more complicated case where the 
range within which the elements of a kollektiv vary, is an infinite one, either a 
countable set or a continuum. All principal problems connected with estab¬ 
lishing the notion of kollektiv can be settled satisfactorily, at any rate, by con¬ 
sidering those general forms of sequences as limiting cases of kollektivs with a 
finite set of attributes. 

4* Example: Set-of-games problem. I want to present now a simple, but 
instructive example to show how the theory works and what task a mathematical 
foundation of the calculus of probability has to achieve. Let us recall the two 
sequences X and X' composed of zeros and ones of which we spoke above. The 
first represented the outcomes of a sequence of single games, the second the 
outcomes of triple sets of those games. If X is considered as a kollektiv with 
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given probabilities p and q for one and zero, it is easy to deduce the correspond¬ 
ing values p' and q’ for X f and to show that X' is a kollektiv too. We begin by 
carrying out three selections which single out from the original sequence X\, 
Xt, x 3 •• • first, the elements Xi , x 4 , X 7 , • • • second, the elements x s , x B , x B , • • * 
and third, the elements x z , X *, x 9 , • • • . It can be shown by means of certain 
further place selections that these three kollektivs which we call Xj , X 2 , X t 
are combinable. That means that combining the corresponding elements of 
the three sequences like xix*x 3 , xa&t , xix &*, • • • leads to a new three dimen¬ 
sional kollektiv X 0 in which each permutation of three digits 0 and 1, has a 
probability equal to the corresponding product of p- and ^-factors. For in¬ 
stance the probability of encountering the group 111 is p 3 and for the group 110 
it is p 2 q. Now we operate a mixing upon X 0 by collecting all permutations 
with two or three ones. We find in a well known way the sum p 8 + 3 p 2 q for 
the probability p' of ones in the sequence X'. So far the result is very well 
known and can be reached—in my opinion, in a very incomplete and unsatis¬ 
factory way—also by the classical methods. 

But what I want to discuss here is a slightly modified question. If the 
sequence X means gains and losses for single games and if the arrangement for 
sets of three games is made as indicated before, then in a real play the gains 
and losses of sets are counted in a different way. For, if the first two games of 
a set are both won or lost by the same player, the fate of the set is decided and 
there is no sense to play the third game. So the loss of the second set in our 
example will already be recognized after the fifth game and the actual sixth 
game will be considered as the first game of the third set. In this way the 
original sequence X decomposed into groups of two or three games 

(X) 1 0 1 I 0 0 I 1 1 I 0 0 I 0 1 1 I 1 1 I 0 0 I 1 1 I 0 1 0 I 1 1 I ... 

leads to a new sequence X" 

(X") 1010110101 ... 

which is obviously different from X'. Everyone familiar with the usual han¬ 
dling of the probability concept will say that in X" the probabilities of zeros and 
ones must be the same as in X'. But a mathematical foundation of theory of 
probability, if it deserves this name, has to clear up the question: From what 
principles or particular assumptions and by what inferences may we deduce the 
equality of the limiting frequencies in X' and X"? 

There is no difficulty in solving this problem from the point of view of the 
frequency theory. We have only to apply somewhat different place selections 
instead of the above used which lead to the kollektivs X \, X 2 , X* . I showed 
elsewhere how the general set-of-games problem can be satisfactorily treated in 
this way. Here I want to stress only that the problem as a whole is completely 
inaccessible by any of the other known approaches to probability theory. The 
classical point of view which starts with the notion of equally likely cases and 
rests upon a rather vague idea of the relationship between probability and 
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sequences of events does not even allow the formulation of the problem. In 
the so caUed modernized classical theory, as proposed by Frfehet, probabilities 
are defined as “physical magnitudes of which frequencies are measures/ 1 
Fr£chet would say that the frequencies both in X' and in X " are measures of 
the same quantity. But why? We face here obviously a mathematical ques¬ 
tion which cannot be settled by referring to physical facts. It is clear that the 
equality of the distributions in the two sequences X f and X" is due to the 
randomness or irregularity of the original sequence X. No theory which does 
not take in account the randomness, which avoids referring to this essential 
property of the sequences dealt with in probability problems, can contribute 
anything toward the solution of our question. 

I have to make some special remarks about the so-called measure theory of 
probability. 2 

5. Probability as measure. Up to now we have been concerned only with 
the simplest type of kollektivs, namely, with those sequences the elements of 
which belong to a finite set of numbers so as to have a distribution consisting 
of a finite number of finite probabilities with the sum 1. It may be true that 
all practical problems, in a certain sense, fall into this range. For, the single 
result of an observation is always an integer, the number of smallest units 
accessible to the actual method of measuring. Nevertheless in many cases it 
is much more useful to adopt the point of view that the possible outcomes of an 
experiment belong to a more general set of numbers, e.g. to a continuous segment 
or any infinite variety. If we include the case of kollektivs of more than one 
dimension, we have to consider a point set in a ^-dimensional space (where 
even k may be infinite) as the label set or attribute set of the kollektiv. In 
order to define the probability in this case we have to choose a subset A of the 
label set and to count among the first n elements the number n A of those elements 
the attributes of which fall into .4. Then the quotient n A : n is the frequency, 
and its limiting value for n infinite will be called the probability of the attribute 
falling into A within the given kollektiv. 

It was rightly stressed by many authors that in the case of an infinite label set 
some additional restrictions must be introduced. In particular A. Kolmogoroff 
set up a complete system of such restrictions. We cannot ask for the exist¬ 
ence of the limiting frequency in any arbitrary subset A . It will be sufficient 
to assume that the limit exists for a certain Kdrper or a certain additive family 
of subsets. If it exists for two mutually exclusive subsets A and B, the limit 
corresponding to A + B will be, by virtue of the original definition, the sum of 
the limits connected with A and B. We can now insert a further axiom involving 
the complete additivity of the limiting values. So we arrive at the statement 

1 What I call measure theory here is essentially that proposed by Kolmogoroff in his 
pamphlet of 1933. As to the new theory developed by Doob in his following paper (where 
instead of the label space the space of all logically possible sequences is used in establishing 
the measures) see my comment on page 215. 
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that probability is the measure of a set. All axioms of Kolmogoroff can be 
accepted within the framework of our theory as a part of it, but in no way as a 
substitute for the foregoing definition of probability. 

Occasionally the expression probability as measure theory is used in a dif¬ 
ferent sense. One tries to base the whole theory on the special notion of a set 
of measure zero. One of the basic assumptions in my theory is that in the 
sequence of results we obtain in tossing a so called correct die the frequency, 
say of the point 6, has a certain limiting value which equals 1/6. A different 
conception consists in stating that anything can happen in the long run with a 
correct die, even that an uninterrupted sequence of six’s or an alternating se¬ 
quence of two’s and four’s or so on may appear. Only all these events which 
do not lead to the limiting frequency 1/6 form, together as a whole, a set of 
events of measure zero. Instead of my assumption: the limiting value is 1/6 
we should have to state: It is almost certain that a limit exists and equals 1/6. 
Nothing can be said against such an alluring assumption from an empirical 
standpoint, since actual experience extends in no case to an infinite range of 
observations. The only question is whether the asumption is compatible with 
a complete and consistent theory. I cannot see how this may be achieved. 
Before saying that a set has measure zero we have to introduce a measure system 
which can be done in innumerable ways. If e.g. we denote the outcome six by a 
one and all other outcomes 1 to 5 by zero, we get as the result of the game with 
a die an infinite sequence of zeros and ones. It has been shown by Borel that 
according to a common measure system the set of all 0, 1 sequences which do not 
have the limiting frequency £ has the measure zero. In this way it turns out 
to be almost certain that the limiting frequency of the outcome six in the case 
of a correct die is Other values for the limit can be obtained by a similar 
inference. It is a correct but misleading idea that the measure zero is unaffected 
by a regular (continuous) transformation of the assumed measure system, since 
in our field of problems different measures which are not obtained from one 
another by a regular transformation have equal rights. So, saying that a certain 
set has the measure zero makes in our case no more sense than to state that an 
unknown length equals 3 without indicating the employed unit. 

In recapitulating this paragraph I may say: First, the axioms of Kolmogoroff 
are concerned with the distribution function within one kollektiv and are 
supplementary to my theory , not a substitute for it. Second, using the notion of 
measure zero in an absolute way without reference to the arbitrarily assumed 
measure system, leads to essential inconsistencies. 

6. Statistical estimation. Let me now turn to the last point, the application 
of probability theory to one of the most widely discussed questions in today’s 
statistical research: the so-called estimation problem. Many strongly divergent 
opinions are facing each other here. I think that the probability theory based 
on the notion of kollektiv is best able to settle the dispute and to clear up the 
difficulties which arose in the controversies of different writers. 
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We may, without loss of generality, restrict ourselves to the simplest case 
of a single statistical variable x and a single parameter where x of course may 
be the arithmetical mean of n observed values. Here (and likewise in the case 
of more variables and more parameters) we have to distinguish carefully among 
four different kollektivs which are simultaneously involved in the problem. 
The range within which both x and # vary will be assumed to be a continuous 
interval so that all distributions will be given by probability densities. 

The first kollektiv we deal with is a one-dimensional one where the probability 
of x falling into the interval x, x + dx depends on x and on a parameter d. If 

(1) p(x | 0) 


denotes the corresponding density and the limits A , B within which x possibly 
falls depend on too, we have 


(10 



for each 


In order to fix the ideas we may imagine that the first kollektiv consists in 
drawing a number x out of an urn and that & characterizes the contents of the 
urn. Asking for an estimate of d implies the assumption that different possible 
urns are at our reach every one of which can be used for drawing the x. The 
values for the different urns fall into a certain interval C, D. It is usual to sup¬ 
pose that the urns are picked out at random so as to give another one-dimensional 
kollektiv with the independent variable d. Let po(&) dd be the probability of 
picking an urn with the characteristic value falling into the interval + dtf. 
This density 


( 2 ) jhW 

is often called the prior or a priori probability of d. As the range within which 
& varies is confined by the constants C and D, we have obviously 

(2') poW dd * 1. - 

Jc 

Now from these two one-dimensional kollektivs with the variables x in the 
first, d in the second, we deduce by combination (multiplication) a two-dimen¬ 
sional kollektiv with the density function 


(3) 


P(#,x) = Jh(&)-p(x I t>). 


The individual experiment which forms the element of this third kollektiv con¬ 
sists of picking at random an urn and drawing afterwards from this urn. Both 
x and & are now independent variables (attributes of the kollektiv) and it is easy 
to see that it follows from (1) and (2) 


fD .*<#) 

P(d,x) dxd& = / d& I p(x |&) dx 

k «) Jc Ja(4) 


(30 


1. 
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We will return later to this two-dimensional kollektiv. Let us, first, derive 
from it, by applying the operation of partitioning (Teilung), our fourth and last 
kollektiv which is one-dimensional again. Partitioning means that we drop 
from the sequence of experiments which form the third kollektiv all those for 
which the x-value falls outside a certain interval x, x + dx; and that in this 
way we consider a partial sequence of experiments with only the one variable d. 
The distribution of tf-values within this sequence with quasi-constant x is given, 
according to the well known rule of division or rule of Bayes (a rule which can 
lie proved mathematically) by* 


(4) 


vM I x) 


_^GM)— _ c ( x ) p( x I 0) 


[ D p(M 

Jc 


d& 


It follows immediately that 

(40 


f pi(& | x) d» = 1. 
Jc 


This function pi of & depending on the parameter x is generally called the 
posterior or a posteriori probability of . 

If | x) can be computed according to the formula (4), every question con¬ 
cerning the “presumable” value of # as drawn from the outcome x of an ex¬ 
periment is completely answered. We can find indeed, by integration the 
probability which corresponds to any part of the interval C f D of and so the 
estimation problem is definitely solved. But the trouble is that in most cases of 
practical application nothing or almost nothing is known about the prior prob¬ 
ability po(&) which appears as a factor in the expression of pi . Hence arises 
the new question: What can we say about the lvalues without having any informa¬ 
tion about its prior probability f This is the estimation problem as it is generally 
conceived today. 

The first successful approach to the answering of this question was made by 
Gauss. If we do not know p t , we know however, except for a constant factor, 
the quotient pi/po , posterior probability to prior probability which equals 
cp(x | tf). The maximum of this quotient must be greater than one, since the 
average values of both po and pi are the same. So the maximum means the 
point of the greatest increase produced by the observed experimental value of x 
upon the probability of . It seems reasonable to assume the d-value for which 
the ratio pi/po reaches its maximum as an estimate for : It is the value upon 
which the greatest emphasis is conferred by the observation. This idea, orig¬ 
inally proposed by Gauss in his theory of errors, has been later developed chiefly 
by R. A. Fisher, and is known today as the maximum likelihood method. Calling 
the ratio pi/po likelihood seems indeed an adequate nomenclature. 


3 For brevity Bayes’ rule is employed in the text as in the case of a discontinuous dis¬ 
tribution. The correct procedure in the case of a continuous x would require that we first 
use finite intervals and then pass to the limit. 
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The method of estimation used most frequently today is not the maximum 
likelihood method, but the so called confidence interval method, inaugurated 
by R. A. Fisher and now successfully extended and applied by J. Neyman. This 
method uses the third of the above mentioned kollektivs instead of the fourth, 
i.e. the two-dimensional probability P(d, x). At first sight it seems hopeless 
to use this function which includes the unknown prior probability poW as a 
factor. But it turns out as Neyman has shown 4 (and this is the decisive idea 
of the confidence interval method) that we can indicate in the x , tf-plane special 
regions for which the probability JJ P($, x) dx d& is independent of po(d). In 
fact, if we point out for every such an interval x \, x* as to have 

p(x I tf) dx * a, 0 < a < 1, 

«i(*) 



it follows immediately from (2) and (5) for the region covered by these intervals 

P(d, x) dx d& = / p 0 (d) d& / p(x | &) dx — a. 
n<*) Jc 

For given a the intervals can be chosen in different ways. If we choose Xi = A 
for = C and x^ = B for & = D, we get a strip or belt, as shown in Fig. 1 
which supplies for every given x a smallest value t?i and a greatest value th • 
The definition of our third kollektiv leads to the conclusion: If we predict each 
time a certain x is observed that t? lies between the corresponding and t? a , then 
the probability is a that we are rights whatever the prior probability may be. 6 It is 

4 J. Neyman, Roy. Stat . Soc . Jour. } Vol. 97 (1934), pp. 590-92. 

•After my lecture Dr. A. Wald called ray attention to Neyman's suggestion; namely 
that this statement can be generalized by admitting that the infinite sequence of values 
which results from picking out successively the urns for drawing a number x, does not 
fulfill the conditions of a kollektiv. So, instead of the terms “whatever the prior prob¬ 
ability may be” we can say “whatever the method of picking out the urns may be.” In 
fact, let us consider the case where 0 can assume only a finite number of values flt, 0 2 , * • • 
Oh . Among the n first trials let n* be the number of cases where 0 ■■ 0* and n* & n« the 
number of cases where 0 — and x falls into the interval 2 i(d s ), x%(0 K ). The relative 
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understood that in this argument both x and d are variables the values of which 
may change from one trial to the next. I cannot agree with the statement, 
which is often made, that x only is a variable and d a constant or that we are 
only interested in one specified value of d. In no way is it possible, in the 
framework of the confidence limits method, to avoid the idea of a so-called 
superpopulation, i.e. the existence of a manifold of urns every one of which forms 
a kollektiv. 6 Thus no contradiction and no antagonism exists between this 
method and the Bayes formula. Only a different kollektiv, a two-dimensional 
instead of a one-dimensional, is here considered. 

I have no time to enter here in a discussion of the very interesting develop¬ 
ments of Neyman’s theory which are intended to supply additional conditions 
in order to determine the arbitrary choice of the ^-intervals in a unique way. 
May I only mention that what is called in Neyman’s theory the probability of a 
second type error in testing the hypothesis d = do is given by the expression 

n fff(do) 1*Z> /•* |(0 q) 

P(d, x) dx dd = / p 0 (d) dd / p(x | tf) dx. 

If we want to determine the confidence belt or the intervals Xi , x 2 in such a way 
as to minimize this expression independently of the function po(d), we obtain 
Neyman’s maximum power condition 

(8) / p(x | &) dx m F{d, t?o) = min. for each pair &, do. 

*xi(0i») 

This condition, it is well known, cannot be fulfilled under general assumptions 
for p(x | d). Moreover the above-mentioned boundary conditions X\(C) = 
A(C ) and x 2 (D) = B(D ) (or similar ones in other cases) have to be considered 
too. If they are not satisfied, the statement which can be made with probability 
a would include the prediction that certain ar-values are impossible. Except 
for this case the above formulated theorem is equally valid for every region 
determined according to (5). 

It is clear that if the original distribution is given by a regular, slightly vary¬ 
ing function p(x | d), the confidence limits method cannot give very substantial 
results. Let us take e.g. for p(x | d) the uniform distribution 

(9) p(x | d) = 1/d for 0 £ x ^ d, 0 % d g 1. 


frequency of correct predictions is then (n[ •+* n' t ■+■ • • • n k )\ n where n equals ni+ni-f 
If n tends to infinity, at least one part of the must become infinite. For those 
the limit of n^tic tends to a according (5) while the other terms (with finite n« and n«) 
have no influence. So the limiting value of the frequency {n[ -f n% 4* • • * n' k ) : n equals 
in any event a. This generalization does not apply, if we ask for the probability of a second 
type error of the hypothesis d -» do. Here the existence of the prior probability po is 
essential. 

* According to the generalization supplied by Neyman’s point of view (Phil. Trans . 
Roy. Soc ., Vol. A-236 (1937), pp. 333-380) which is discussed in footnote 5, the superpopu¬ 
lation does not necessarily satisfy the conditions of a kollektiv. 
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We have here A = 0, C = 0, D**l and the domain in which * and 0 

vary is the 46° right triangle shown in Fig. 2. Whatever po(tf) may be, the 
integral of p(t>, x) = po(t>)-p(x | tf) over this domain is 1 and if we omit the 
part of the triangle on the left of the straight line x = (1 — a)tf, the integral 
over the remaining part is a. For a = 0.90, a statement which can be made 
with a probability of 90% reads: The value of d lies between x and 10*. On 
the other hand we know from the very beginning with 100% certainty that 0 
lies between x and 1, so that for x £ 0.1 the statement is futile. (If one chooses 
as confidence belt the part on the left of the straight line x = ortf, the statement 
would run: «? lies between 1.1 x and 1 and values of x greater than 0.9 are 
impossible.) If we apply in this case the Bayes formula, we find that the out¬ 
come depends to the highest extent on what is known about the prior prob¬ 
ability po(tf). 

In most cases however which present themselves in practical statistics the 
original density function p(x | &) has a different character from that assumed in 



(9). It depends generally on an integer n and the distribution is concentrated 
more and more when n increases. (We may define here concentration as 
standard deviation tending towards zero. The integer n means in general the 
number of basic experiments). We have e.g. in the so-called Bayes problem 
where x is the arithmetical mean of n observations the asymptotic expression 
for p: 


( 10 ) 


> 1,|l) ~Vi,rj d- T)* 

0 £ x g 1. 


If we denote by 4> the probability integral 

(11) *(*) - ~ jC du, 

the x-intervals corresponding to a given probability value a are defined by 


(12) *i - * - (, 


*» 


o-M *(« 


a . 
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If n has a large value, the £’s are very small and we get a narrow belt along the 
straight line x = d as shown in Fig. 3 for « = 0.90 and n about 100. The 
prediction which can be made with the probability a reads approximately 

(13) * - * g d s£ * + v where f (n - a. 

On the other hand it is well known that in this case the Bayes formula supplies 
a posterior probability pi(t> | x ) which turns out to be more and more independent 
of the prior probability p 0 (t>) when n increases. It has been shown that the 
asymptotic expression for pi(d | x) whatever po(tf) may be, is 

(14) ~ *■ 

It follows that, on the basis of the Bayes formula, we can predict for every 
single value of x with the probability a that d lies between the above given 





limits (13). This is more than the confidence limits method supplies, but the 
result is subjected to the restriction that po(t>) is a continuous function. How¬ 
ever, for large values of n (generally this means for large numbers of basic ex¬ 
periments) the outcomes of both methods are essentially the same. 

Let me recapitulate in three brief sentences the essential results we have 
found in the problem of estimation. 

1. There is no contradiction of any kind between the Bayes formula and the 
confidence limits method and no difference at all in the underlying probability 
concept. In both methods the idea of a sort of “super-population” is used. 
Only two different kollektivs are considered in both cases. 

2. If the original distribution has a regular, slightly varying density function 
p(i | d), the Bayes method gives a complete answer when the prior probability 
is known and no answer when it is unknown. The confidence limits method gives 
in bpth cases a definite solution; it lies in the nature of things that the solution 
cannot be very substantial if p(x, d) is only slightly varying. 
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3. If the original distribution p(x \ &) depends on a further parameter n and 
becomes concentrated more and more with increasing n, both approaches give, 
for large », asymptotically about the same results. 

It is not intended by these remarks to impair the value of the confidence 
limits method which both from theoretical and from practical point of view 
deserves our attention. But the rather inconceivably aggressive attitude 
towards the Bayes’ theory as displayed by a number of statisticians, which, 
however, does not include J. Neyman, turns out to be completely unfounded. 



PROBABILITY AS MEASURE 

By J. L. Doob 
University of Illinois 

The following pages outline a treatment of probability suitable for statisti¬ 
cians and for mathematicians working in that field. No attempt will be made 
to develop a theory of probability which does not use numbers for probabilities. 
The theory will be developed in such a way that the classical proofs of proba¬ 
bility theorems will need no change, although the reasoning used may have a 
sounder mathematical basis. It will be seen that this mathematical basis is 
highly technical, but that, as applied to simple problems, it becomes the set-up 
used by every statistician. The formal and empirical aspects of probability 
will be kept carefully separate. In this way, we hope to avoid the airy flights 
of fancy which distinguish many probability discussions and which are irrelevant 
to the problems actually encountered by either mathematician or statistician. 

We shall identify as Problem I the problem of setting up a formal calculus to 
deal with (probability) numbers. Within this discipline, once set up, the only 
problems will be mathematical. The concepts involved will be ordinary mathe¬ 
matical ones, constantly used in other fields. The words “probability,” 
“independent,” etc. will be given mathematical meanings, where they are used. 

We shall identify as Problem II the problem of finding a translation of the 
results of the formal calculus which makes them relevant to empirical practice. 
Using this translation, experiments may suggest new mathematical theorems. 
If so, the theorems must be stated in mathematical language, and their validity 
will be independent of the experiments which suggested them. (Of course, if a 
theorem, after translation into practical language, contradicts experience, the 
contradiction will mean that the probability calculus, or the translation, is 
inappropriate.) 

The classical probability investigators did not separate Problems I and II 
carefully, thinking of probability numbers as numbers corresponding to events 
or to hypothetical truths, and always referring the numbers back to their 
physical counterparts. The measure approach to the probability calculus has 
put this approach into abstract form, and separated out the empirical elements, 
thus removing all aspects of Problem II. We shall explain this approach first 
in a simplified set-up, that which will be made to correspond (Problem II) to a 
repeated experiment in which the results of the nth trial can be any integer x n 
between 1 and N (inclusive), in which the experiments are independent of each 
other, and performed under the same conditions. (The set-up will be applicable, 
for example, to the repeated throwing of a die.) 
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The measure approach treats this experiment as follows. Let u: (x %, x 2 , - * • ) 
be any sequence of integers between 1 and N, inclusive. We consider w as a 
point in an infinite dimensional space ft. (Each point w may be considered as a 
logically possible sequence of results of the given experiment, and this fact will 
guide us in solving Problem II.) A measure function is,defined on certain sets 
of points of ft as follows. Let Pi, • • • , p* be any numbers satisfying the 
conditions 

Vi s® 0, j £ 1, pi + • • • + Vs * 1. 

(How these numbers are chosen in any particular problem will be explained 
below. The method of choice is irrelevant to the mathematics, but is involved 
in the solution of Problem II.) The set of all sequences beginning with x x = a 
is given measure p*. More generally, the measure of the set of all sequences 
beginning with x x = a x , • •. , x n = a» , is defined as p 01 p« t • • • p« B . In this 
way, as can be shown, 1 a completely additive measure function is determined 
on certain point sets of ft, on a field 5 of sets so large that all the usual Lebesgue 
measure and integration theory is applicable. This means that there is a col¬ 
lection g of sets of points of ft such that if S x , St , • • • are finitely or infinitely 

*° «o 

many sets in the collection, their sum , their intersection JI $», and 

i i 

their complements are also in the collection. Each set S in $ has a definite 
measure P(S), 0 g P(S) g 1, and ii S% 9 S% 9 •• • are finitely or infinitely many 
disjunct sets in 5, 

P(S, + &+.-.)- P{S X ) + P{St) + “ •. 

Problem II, the translation problem, is solved as follows. Each relevant 
event is made to correspond to a point set of ft. A relevant event is a physical 
concept-defined by imposing some set C of conditions on the results of the 
experiments. The corresponding ft-set is the set of sequences (x x , x %, • • • ) 
satisfying the same set C of conditions, imposed on the Xj . Thus the set of all 
sequences beginning with x x = a x , x 2 = a*, is made to correspond to the event: 
the result of the first experiment is a x , of the second is a 2 . As is to be expected, 
the mathematical picture goes further than the real one. The “event” 1 occurs 
infinitely often in a sequence of trials has only conceptual significance, physically, 
but the corresponding point set of ft: the set of all sequences , x %, • • • ) con¬ 
taining infinitely many l’s, is a perfectly definite point set whose measure can 
be calculated in terms of pi, • • • , p N . (In fact it is easily seen that this 
measure is 1 or 0, according as pi > 0 or p x * 0.) By “the probability of an 
event” we shall mean the measure of the corresponding ft-set. As this measure 
has been defined, the probability that the nth trial results in a number j is p,*, 
and the probability that one trial results in j, and another in fc, is p, p*. 

1 Cf. A. Kolmogoroff, Ergebnisse der Mat hematite, Vol. 2, No. 3, Orundbegriffe der Wahr- 
acheinlichkeitsrechnung , where the most complete treatment of the approach to the proba¬ 
bility calculus from the standpoint of measure is given. 
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The justification of the above correspondence between events and O-sets is 
that certain mathematical theorems can be proved, filling out a picture on the 
mathematical side which seems to be an approximation to reality, or rather an 
abstraction of reality, close enough to the real picture to be helpful in prescribing 
practical rules of statistical procedure. The following two theorems are im¬ 
portant ones, from this point of view. These two theorems depend in no way 
on observed facts. They are stated and proved in the customary language of 
modern analysis. 

Theorem A: Let j n be the number of the first n coordinates of the point 
«: (£i , X 2 , • • • ) which are equal to j y where j is some integer (1 £ j £ N ) which 
will be kept fixed throughout the discussion. Then 0 ^ in ^ n, and j n varies from 
point to point on Q :j n = j n (w) is a function of w, that is of the sequence {x \, xt , • • •). 
When n *>, j H /n has not a unique limit independent of the sequence 
(xi, x %, • • • ) under consideration. In fact if w is the point ( k y k, • • • ), j n (w) = 0 
for all n, unless j = k; if w is the point (j, j, • • • ), j n (w) =* n for all n. It is 
simple to give examples of sequences a>:(xi , x* , • • • ) for which j n (<a ) oscillates 
without approaching a limit, as n —► «. But Theorem A (usually called the 
strong law of large numbers) states that there is a set of sequences, i.e. an w-set S, 
of measure 0, such that 



hm —- = Pi , 

n—*oo fl 


unless a > is in S. In other words the sequences for which (1) is not true are 
exceptional in the sense of measure theory. If a new choice { p \) of p/s is made, 
. then if p\ ^ p,, the new exceptional set includes all the sequences which were 
not exceptional before, since the limit in (1) becomes py . Thus S depends 
essentially on p,-. Theorem A is a generalization of Bernoulli's classical theo¬ 
rem which states in our language that the measure of the set of sequences 
u>:(xi, & , • • • ) for which 


li»(«)/« - Pi I > « 


approaches 0, as n , for any positive c. Theorem A is stronger because it 
states that there is actual convergence, whereas Bernoulli's theorem only con¬ 
cludes that there is a kind of convergence on the average. 

Theorem A corresponds to certain observed facts, relating to the clustering 
of “success ratios," giving rise to empirical numbers p,. If the statistician 
wishes to apply his calculus to a given experiment (Problem II), he sets p,* == p ; -. 
There has been frequent discussion of the problem of determining the py. 
This discussion of the p, is sometimes held on so high a plane that the innocent 
bystander may wonder to what purpose such abstract philosophic concepts could 
possibly be put—besides that of stimulating further discussion on a still higher 
plane. The principle purpose of this paper is to discuss Problem I, but a few 
words on Problem II might not be out of place here. Almost everyone who is 
going to use probability numbers, the pj , for other than conversational purposes y 



PROBABILITY AS MEASURE 


209 


derives them in the same way. There is a judicious mixture of experiments 
with reason founded on theory and experience. Thus if a coin is tossed by an 
experimenter who has examined the coin, and found that it had heads on one 
side but not on both, that it seemed balanced, and that (as a confirming check) 
tossing a hundred times gave around 60 heads, the experimenter would use $ 
as the probability of obtaining heads in his further reasoning. Of course ^here 
is no logic compelling this. The experimenter may have been fooled. A coin 
far out of balance may turn up 60 heads in 100 throws. But man must act, 
and the above procedure has been found useful, which is all that is desired. In 
many experiments, less reliance can be placed on a preliminary physical examina¬ 
tion of the experimental conditions, and more must be placed on the actual 
working out of the experiment, as in the analysis of machine products. In that 
case, the actual results must be examined with great care, before attempting 
to use the above mathematical set-up. It sometimes may even be possible to 
change the experimental conditions to make the mathematics applicable. 2 In 
all cases, such mathematical theorems as Theorem A and the following Theo¬ 
rem B give the basis for applying the formal apparatus to practice. Indeed, 
the criterion of application includes the verification of special cases of the prac¬ 
tical versions of Theorems A and B. 

Theorem B: Let f n (x i, • • • , x„_i) (n > 1) be any function of the indicated 
variables, except that we suppose/ n only takes on the values 0,1. Letw: • • •) 

be a given point of 12. Let n' be the number of the first n integers i such that 
fi(xi , • - • , Xi-\) = 1, and let j' n be the number of the first n integers i such that 
fi(x i, • • • , Xi-x) = 1, and x, = j. Th enj' n , n' are functions of o»: (xi , x %, • • • ). 
If/i -/*- ••• - Wl = jn , n' as n, where j n is as defined above. Suppose 
that there is an 12-set So of measure 0 such that n' —> oo, as n —► », unless ca t S. 
Theorem B states that there iR then an 12-set S' of measure 0, such that if 
«: (xx , xt , • • • ) is not in 5', 


(10 



Pi- 


(The set S' will depend on the given functions fx , U > - * • and on the , but is 
fixed, once these have been chosen.) This mathematical theorem corresponds 
to certain observed facts (usually summarized by stating that no (successful) 
system of play is possible). In fact, it states, in the language of practice, that 
rejecting certain trials, using as a criterion of acceptance or rejection the results 
of preceding trials, rejecting the ith trial if /*(xi, • • • , x t _i) = 0, does not affect 
the outcome of a game of chance, or, more precisely, does not affect the validity 
of the physical fact corresponding to Theorem A. If/i es / 2 a ... s 1, (1') 
becomes (1). The hypothesis that n' —► oo as n —► » unless <a € So is made to 
insure that infinitely many trials will be accepted. As an example of the 


* Cf. W. A. Shewhart, Statistical Method from the Viewpoint of Quality Control, Wash¬ 
ington, 1939. 
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possible variety in the definition of the /,, we might define ft as 1 if =» N, 
and fi =* 0 otherwise, so trials are accepted only if the previous trial resulted in 
the number N. Or much more complicated systems can easily be devised in 
which the criterion of acceptance of the nth trial depends on a varying number 
of the results of preceding trials. This theorem gives a mathematical counter- 
par^to the physical idea of the mutual independence of repeated trials. 

To summarize, mathematically (Problem I) the study has been reduced to 
that of the measure properties of 12. This can be considered independently of 
any physical correspondence. The physical correspondence (Problem II) makes 
any event S correspond to a point set E of 12, the “probability of 6” becomes 
the measure of E . Thus “the probability that the result of the first experiment 
is 3” becomes the measure of the set of sequences {x v , x* , • • • ) beginning with 
X\ = 3. We have given no sharp definition of 'probability as a physical concept . 
If the above mathematical set-up, after translation, using some set of pf s, 
seems to fit a given physical set-up, any event will be said to have as its proba¬ 
bility, the measure of the corresponding 12-set. We have attempted to give no 
intrinsic a priori definition of the probability of an event: such a definition is 
quite unnecessary for our purposes. All that was required was a basis for pre¬ 
scribing the usual statistical procedures, and we have described such a basis. 

In the above example, there would have been no new difficulty introduced 
if the x n were not restricted to integral values, but allowed to take on any 
numerical values. The general point <■>: (xi , Xt , • •. ) of 12 would now be any 
sequence of real numbers. Instead of choosing the numbers pi , • • • , ps we 
choose a “distribution function” F(x)> a monotone function with the following 
properties: 

lim F(x) = 0, lim F(x) = 1, F{x — 0) = F(x ). 

Measure on 12 is defined as follows. The set of all sequences beginning with X\ 
such that a ^ xi < b is given measure F(b) — F(a). (The number F(b) is 
called “the probability that X\ < b”) More generally, the measure of the set 
of all sequences (xi , xt , • • • ) beginning with Xi, • • • , x n , such that o 7 g 

x i < bj , j * 1, • •. , n is defined as H [F(bj) — F(a,)]. Thus if F{x) defines a 

i 

simple rectangular distribution: F(x) = 0 for x < 0, F{x) «= x for 0 g x S 1, 
F(x) = 1 for x > 1, 12-measure becomes (infinite dimensional) volume in the 
(infinite dimensional) unit cube. The correspondence (Problem II) between 
events and point sets of 12 is defined just as before. Sometimes it may be useful, 
in considering experiments giving rise to pairs of numbers, to let each x n be a 
pair of numbers so that 12 becomes a sequence of points of a plane instead of a 
sequence of points of a line. In all cases there are mathematical theorems 
true of the resulting 12 which guide us (Problem II) in deciding just how the 
12-measure is to be defined, that is, how F(x) is to be defined, in dealing with a 
given practical problem. But the essential point is this. Once 12-measure has 
been defined, no changes or further hypotheses are possible or necessary. All 
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relevant probability questions are answerable. Thus consider a question of the 
following type: if the experiments are grouped in some way, 1 * * with what proba¬ 
bility will the groups have some given regularity property? 4 The question singles 
out a set E of sequences of n and asks: what is the measure of E? The problem 
may or may not be difficult mathematically, 4 depending on the grouping, but 
the original definition of measure on Q needs no enlargement to answer it. 

Technically, the mathematics has become the mathematics of a special type 
of measure defined on a space of infinitely many dimensions. If, however there 
is an integer v such that only at most v experiments are to be considered, we 
need only consider the v-dimensional space of points (xi, • • • , x,), defining 
measure in this space in the same way as on Q. Thus if x n has the rectangular 
distribution defined above, the measure in (x,, • • • , z,)-space becomes ordinary 
p-dimensional volume in the unit cube. Perhaps the most common measure a 
statistician considers is that in which the measure of an (x,, • • • , z,)-set E 
becomes “the probability that the point (xi , • • • , x,) representing an inde¬ 
pendent sample of v from a normal distribution of mean 0 and variance a ” 
will lie in E: 

(2) P{E } = <r- , (2T)“‘” J...J dx, .. • dx,. 

This example makes it obvious that the statistician is always doing measure 
theory, even though he may not state that fact explicitly. If the number of 
experiments has no upper bound conceptually—mathematically when the num¬ 
ber of dimensions v may increase without limit, as in Theorems A, B, it is much 
more convenient to use the space ft, in terms of which experiments with varying 
numbers of trials can be considered simultaneously. The classical proofs of 
probability theorems, such as Bernoulli’s theorem (the law of large numbers) 
are perfectly correct. If the “probability of an event” is interpreted as the 
measure of a set, these proofs do not even need verbal.changes. There can be 
no question of the need for any axiomatic development beyond that necessary 
for measure theory, and the probability calculus can lead to no contradiction, 
unless the theory of measure is faulty. 

It is customary for probability theorists to stop their discussions when the 
present stage is reached, so that the beginnings of a formal calculus have been 
constructed to deal with a repetition of independent experiments, conducted 

1 A grouping is necessary, for example, when two players are playing a game in which 

two out of three wins in the trials win a game. The trials are then grouped into successive 

groups of two or three, depending on how they come out. 

4 Continuing the preceding note, the question might be: will the ratio (games won by 
player <x)/(games played) approach a limit with probability 1, that is, for all of the original 
sequences {&,») except possibly some forming a set of measure 0? 

1 The answer to the question of the preceding notes is simple. If p is the probability 
that player a wins a trial, the ratio in question approaches p* + 3p # (l — p), the probability 
that <x wins a game, with probability 1. 
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under the same conditions. Perhaps this is because of the following widely held 
syllogism: probability is something dealing with random events; random events 
are events having no influence on each other; therefore.... Unfortunately 
mathematicians and statisticians must deal with many problems involving de¬ 
pendent probabilities, whose solutions require the most delicate and careful 
applications of modern analysis. The rudimentary calculi which the outsiders 
find esthetically or philosophically pleasing are usually either insufferably awk¬ 
ward or completely insufficient for the needs of professionals. There is a strange 
situation, which one observer has facetiously described somewhat as follows: it 
is true with probability 1 that the technical workers in probability use the 
measure approach, but that the writers on “probability in general” descendants 
of Carlyle’s professor, do not consider this approach worth much more than a 
passing remark.* The following pages outline how our previous treatment is 
generalized to deal with problems in which it is desirable to have the distribution 
of Xj vary with j (so that physically the experiments are no longer the same), 
and in which the x,- do not have to correspond to the results of independent 
experiments. Some attempt will also be made to show how the modern mathe¬ 
matical theory of real functions is applied to the probability calculus. 

Let Xj = x y(w) be the jth coordinate of the point«: (xi , x 2 , • • • ). Then as 
the sequence «:(xi, x 2 , • • • ) varies, x,- does also: x,(w) is a function of u. The 
functions Xi(«), x 2 («), • • • are functions defined on 12, an abstract space on which 
a measure has been defined. Moreover 12-measure has been defined in such a 
way that the 12-set for which x,(w) < K (j, K fixed) is an 12-set whose measure 
has been defined. (This set is composed of all sequences (xi, x 2 , • • ■ ) whose 
jth coordinate is <K, and the measure is F(K), using our last definition of 
12-measure.) In the terminology of measure theory, x,(w) is thus a measurable 
function. The study of the measure relations of 12, and this is the whole of our 
probability calculus, can be considered, from this point of view, as the study of 
the properties of a sequence of measurable functions, one with very special 
properties, as we shall see, defined on some space. A measurable function 
defined on 12 is usually called a chance variable, in the theory of probability. 
(This terminology is somewhat dangerous, because it mixes Problems I and II.) 
The whole apparatus of modem real variable theory is applicable to these 
chance variables. Thus if /(«) is a chance variable (measurable function of «) 
(physically, a function of the observations), it is customary to define a number 
called its expectation. This number is simply the integral of /(«), with respect 
to the given 12-measure. The fact that the expectation of the sum of two chance 
variables is the sum of their expectations is simply the familiar theorem that the 
integral of the sum of two functions is the sum of their integrals, Let S(j, K ) 
be the 12-set defined by the inequality x,- < K. Up to now we have supposed 

'This analysis, like every other probability statement, is only an approximation to 
reality, but a fairly olose one. 
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that the measure of S(j, K ) is independent of j, that is that the distribution of x,- 
is independent of j. We have also supposed that 7 

(3) P{fl(l, K x ) ■ ■ • S(n, K „)} = P\S( 1, K>)} ... 

for any positive integer n, and numbers Ki , • • • , K n . That is, we have sup¬ 
posed that xi («), x*(«), • • • are mutually independent chance variables.® In 
fact probability measure on 0 has been defined just to make the foregoing two 
facts true. Mutual independence is a very strong hypothesis to impose on a 
sequence of functions. In many probability problems (Markoff chains for 
example), more general measures must be defined on 12. The sequence Xi(«), 
xt(u), • • • whose properties are those of 12-measure, is then no longer a sequence 
of independent functions, and the distribution of x, can vary with j. 

At this level, the study becomes the study of any sequence of measurable 
functions, defined on some space of total measure 1. If /, g are given chance 
variables, they may turn out to be independent. In that case the theorem that 
the expectation of their product is the product of their expectations becomes, 
when translated into mathematical language, the familiar theorem that 

J f f(x)g(y) dxdy = j f(x) dx J g(y) dy. 

The mathematical theorems are not simply analogues of the probability theo¬ 
rems—they themselves are those theorems. When stated mathematically, the 
probability theorems need no proof: they need only recognition as standard 
results. 

Empirical needs suggest that certain functions called conditional probability 
distributions, and conditional expectations, should be defined in a certain way. 
This is possible, as a formal matter, 8 and the theorems then proved about these 
functions gives them their usual meaning when translated into practical language. 
These functions are extremely useful tools in dealing with mutually dependent 
(that is not independent) chance variables. 

The above approach is easily generalized to the stage needed in the study of 
Brownian movements or of time series, in which, instead of the proper initial 

7 P{S] was defined as the measure of the tt-set S. 

8 The n chance variables /i(w), /i(«), • • • , fnM are said to be independent if for every 
set of n numbers K i , • • • , K „ , the following equality is true. 

< K i9 j-1, -,n} -nP!/i(«) <Ki) t 

where Pf * * * ) denotes the U-measure of the Q-set defined by the conditions in the braces. 
Thus in the example of a normal distribution in v dimensions given above, Xi , • • • , xv 
are independent functions on the space of v dimensions, a fact which follows readily from 
the fact that the ^-dimensional density function is the product of v functions of the separate 
variables. 

• Cf. Kolmogoroff, loc. cit. 
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abstraction being a sequence {a;*} of numbers, we have a one-parameter family 
fz ( } (t takes on all real values). The number x t may, for example, be thought 
of as the z-coordinate of a particle at time t. There is no difference in principle 
here: 8 is now the space of functions of t, instead of the space of sequences, that 
is functions of». From the other point of view, instead of studying the proper¬ 
ties of a sequence of measurable functions, it becomes necessary to study the 
properties of a one-parameter family of measurable functions. 



DISCUSSION OF PAPERS ON PROBABILITY THEORY 

By R. von Mises and J. L. Doob 

1. Comments by R. von Mises. Professor Doob outlines a new theory of 
probability starting with the following three basic conceptions. First, he uses 
the notion of an infinite sequence of trials or better: of an infinite sequence of 
numbers x x , x* , Xs , • • • which can be considered as the outcomes of infinitely 
repeated uniform experiments. Second, he introduces (in his Theorem A) the 
limit of the relative frequency of a particular outcome a . Third, (in his Theo¬ 
rem B) the notion of place selection defined by a sequence of functions 
f n (x% , x% , • • • £„-i) is employed. All these three concepts are completely 
strange to the so called classical theory as developed by Bernoulli, Laplace, 
Poisson, etc. They have been introduced and made the comer stone of proba¬ 
bility theory in my papers published since 1919. I daresay that in no probability 
investigation before 1919 any of those notions even were mentioned. 

This concerns what Professor Doob calls the Problem I or the purely mathe¬ 
matical aspect of the question. As to his Problem II or the relationship between 
the formal calculus and real facts Professor Doob stresses that the actual values 
for probabilities that enter as data into a particular argument have to be drawn 
from long, finite sequences of experiments. This is in" complete accordance 
with the standpoint of my theory and in strict contradiction to the classical 
conception which knows only “a priori” probabilities determined by “equally 
likely cases.” 

In both theories, Professor Doob's and mine (not in the classical) a mathe- 
thematical model or picture is associated with a long sequence of uniform 
experiments. These models are different in both theories. My model (the 
“kollektiv”) consists of one infinite sequence X\> x% , x% , . •. in which the 
limit of the relative frequency of each possible outcome a exists knd is indifferent 
to a place selection; the value of this limit is called the probability of a. 

On the other hand Professor Doob’s model implies all logically possible se¬ 
quences which form a space fit and he shows that in this space a measure function 
can be introduced which fulfills the following conditions: (1) If m is a positive 
integer, the set of all sequences the rath element of which is a has a measure p* 
independent of m; (2) the set of all sequences in which the relative frequency 
of a-results has either no limit or a limit different from p a is zero; (3) if S is any 
place selection, the set of all sequences o> for which the relative frequency of a 
in S(<a) has either no limit or a limit different from p a is likewise zero; this value 
p a is called the probability of the outcome a. It then can be shown that a 
probability in this sense can be ascribed to certain events, i.e. to certain types 
of experiments which in some way are connected with the sequence of basic 
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experiments. E.g. if the original sequence consists of the single successive 
tossings of a die, the derived sequence may consist of pairs of tossings with the 
sum of the outcoming points as new value of a. The new probabilities p' a are 
found as measures of certain sets in the original measure system established in 0. 

There is no doubt that the model used by Professor Doob for representing 
empirical sequences of uniform experiments is logically consistent. Its practical 
usefulness depends on how the usual problems of combining different kollektivs 
and so on can be settled within this scheme. This has to be shown in detail. 
It seems to me that my conception is simpler in its application and closer to 
reality, while his model may be considered more satisfactory from a logical 
standpoint since it avoids the difficulties connected with the concept of “all 
place selections.” At any rate, however, there is no contradiction or irrecon¬ 
cilable contrast: both theories are essentially statistical or frequency theories, 
equally far from the classical conception based on “equally likely cases.” In 
both theories probabilities are, of course, measures of sets. 

2. Comments by J. L. Doob. It is perhaps unfortunate that Professor von 
Mises’ treatment of probability problems, based on typical sequences (“collec¬ 
tives,” “admissible numbers”), is commonly called the “frequency theory.” 1 It 
is clear to any reader of our papers (identified as M and D below) that the idea 
of frequency, at least in the discussion of the relation of mathematics to prac¬ 
tice, is no more fundamental to one approach than to the other. In one mathe¬ 
matical treatment frequency notions first appear in the theorems, whereas in 
the other they first appear in the axioms; but they appear in both. The principal 
objection the measure advocates have to the frequency approach is that it is 
awkward mathematically. Anyone who doubts this awkwardness need only 
examine various books published recently, using this approach, to see what a 
lot of fussy detail is involved merely in proving such elementary results as the 
Tchebychefif inequality or the Bernoulli theorem. One author considers it neces¬ 
sary to have his chance variables so restricted that if x is a chance variable, the 
event x < k has a probability assigned to it only if k is not in some exceptional 
set, which may be infinite. To take another example, consider the coin tossing 
game discussed in both M and D, in which two out of three wins at tosses win 
a game. Apparently the probability analysis of this game is somewhat difficult 
in terms of the frequency theory. As the quite elefnentary treatment outlined 
in D shows, there is no difficulty involved, using the measure approach. The 
question is simple: a set of chance variables is given (corresponding to the 
original tosses); a new set is determined from them (corresponding to the 
grouping into games). Only elementary algebraic manipulation is required to 
verify that the new chance variables are mutually independent in the mathe¬ 
matical sense, (Cf. D), and have the same distribution, so the law of large 
numbers is applicable. Professor von Mises considers that the measure theory 
cannot handle this problem. I on the other hand consider that this problem 
exhibits the mathematical disadvantages of the frequency theory. 

L This identifying name will be used below also. 
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The frequency theory reduces everything to the study of sequences of mutually 
independent chance variables, having a common distribution. “Probability 
theory is the study of the transformations of admissible numbers” writes Pro¬ 
fessor von Mises. This point of view is extremely narrow. Many problems of 
probability, say those involved in time series, can only be reduced in a most 
artificial way to the study of a sequence of mutually independent chance vari¬ 
ables, and the actual study is not helped by this reduction, which is merely a 
tour de force. 

It is claimed in M that the axioms of measure theory only describe the distri¬ 
bution within one collective (M, p. 00). This statement seems to mean that 
only the measure relations (using the notation of D) of the first coordinate 
function can be discussed in the measure theory, that is only probabilities 
of the type: the probability that x x < k (in the language of practice, “the 
probability that the result of the first experiment is less than k”) are discussed. 
Actually, however, (Cf. D) the measure theory can discuss any number of ex¬ 
periments simultaneously, using the appropriate space Q. 

Many of the debates between the advocates of the various probability theories 
have been wasted, because some of the debaters talk mathematics, others physics. 
With this in mind, I should like to stress again 2 that (except for a few philo¬ 
sophically inclined Englishmen) everyone calculates probability numbers in the 
same way—a combination of reasoning based on experience and helped by 
theory, with examination of the experimental conditions and the results of trials. 
Frequency considerations necessarily play a large part. The fact that almost 
everyone calculates probability numbers in the same way does not alter the 
fact that one mathematical theory may be more useful or convenient than 
another in dealing with these probability numbers. 

In closing, it seems proper to call attention to what the measure advocates 
consider the real services and contributions of the approach of Professor von 
Mises. Professor von Mises was the first to stress the importance of the second 
of two fundamental generalizations of experience in dealing with repeated mu¬ 
tually independent experiments of the same character: (1) the clustering of 
success ratios and (2) the fact that this clustering is unaffected by a system of 
rejection as described in M and D. These two generalizations of experience are 
certainly fundamental. The only point under discussion here is how such gen¬ 
eralizations are to be put into a mathematical setting. The original such setting 
of Professor von Mises was criticized as not really mathematical. The setting 
now proposed by Copeland and others is criticized by the measure advocates as 
mathematically inflexible and clumsy. But it is significant that even in a treat¬ 
ment of the measure approach, as in D, it was felt essential to stress the mathe¬ 
matical interpretation of the two empirical generalizations of Professor von 
Mises. In the terminology of D, the measure advocates consider the contribu¬ 
tion of Professor von Mises’ approach to be a contribution to a solution of 
Problem II, not to Problem I, the mathematical problem. 

* We are not talking mathematics now, but tho application of mathematics. 



CONTINUED FRACTIONS FOR THE INCOMPLETE BETA FUNCTION 1 

By Leo A. Aroian 
Hunter College 


1. Introduction. Existing literature on the problem of calculating the in¬ 
complete Beta function 

(1.1) B,(p, Q) ~ - x)* -1 dx, 0 < * < 1, p > 0, q > 0, 

and the levels of significance of Fisher’s z [1] leave further work to be done. 
Mailer’s continued fraction and a new continued fraction are shown to possess 
complementary features covering the range of B*(p, q) for all values of x, p, q. 
Previous methods of computing I x (p, q) = B*(p, g)/B(p, q) are given in [2], [5], 
[6], [8], [10], [13], [14], [15]. 

Mailer’s continued fraction is 


( 1 . 2 ) 

where 


Up,q) = C 


bi bt bi bj 

1 + 1 + 1 + 1 + 



T(p + q) 


x p d - *r\ 


hi — 1, 


r(p + l)r(g) ~’ “* *• p + «’ 

i = __ (p + 8 - l)(p + s) X 

(p -I- 28 — 2)(p + 28 — 1) M * i — x’ 

8(p + q + 8) x 


5*.+1 


(p + 2s - l)(p + 2a) 1 - x * 


A convergent infinite series 1 + £ d»x" can be converted into an infinite con- 

1 

turned fraction of the form rrrr rr ,, ‘ where [4], [9] p. 304, 

1 + 1 + 1 + 


(1.3) 


Ci = —/Si, 



/• r= ft* 


„ _ — ftu-a0i.+i 

m ~qT- oi -* 

PU-lP%$ 


8 > 2 


1 Presented at a meeting of the American Mathematical Society, October 28, 1939, New 
York City. 
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where 



1 

di 

* ... 

d. 


d\ 

dt 

dt 

• • • d*+i 

Pu 

<k 

• 

• 

dt 

ds • • • 

d*+1 

• 

, A.+i = 

dt 

dt 

di 

• • • d*fs 


d. 

d«+i 

d«+i • • • 

• 

da* 


dt+i 

d,+* 

d.+j 

• • • dju+i 


Pu 9* Of fti+1 9* 0. 

The infinite continued fraction found in this manner is called the corresponding 
continued fraction and the power series is said to be semi-normal if Pt, i* 0, 
Pt+H ^ 0* 


2. A new continued fraction. MUller found his continued fraction by con¬ 
verting in the manner of the preceding paragraph 


hip, q) 

( 2 . 1 ) 


r(p + q)x p (l - a;)® -1 
r(p + Dr (q) 

f, , f' (« - l)(g - 2) • • • (g - r - 1) / x Y* 1 ! 

1 ^ r tS(p+ l)(p + 2) ... (p + r+l)Vl-*/ /’ 

x < £. 


We convert 


hip, q) 

( 2 . 2 ) 


r(p + qjrfi 1 - x) q 
r(p + i)r(g)' 

J, i f (p + g)(p + q + 1) ■ ♦ • (p + q + r) h-i\ 

\ r-o (p + l)(p + 2 ) • • • (p + r + 1) /’ 

0 < x < 1. 


Consequently 

„ _ P + q a _ ip + ?)(1 - q) 

ft -jT+V & "(?+W+2 
a _ (p + q)ip + 9+l)*”(p + ?+ 8 — l)(p + q + *) a 

*** (p + 8+l)(p + s + 2) ... (p + 2«)(p + 2« + l) Pl *' 

(1 - g)(2 - q) ••• (8 - q)(s + 1 - g)(8 + 1)1 . 

^ = (p + l)(p + 2) • • • (p + 28 + l)(p + 28 + 2) W * +1 ' 

_ ip + «)(p + « + s) _ s(g - 8) 

“+ 1 (p + 28)(p + 28 + l)’ * (p + 28 — l)(p + 2a) ’ 


r(p + g)* p (l - ®) 4 / 1 Ci ft \ 
r(p + i)r(g) \i+i+i+”7’ 


and 

(2.3) 


hip, q) 
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where C t = c*x. By well known theorems due to Van Vleck [12] and Perron 
[9] p. 347 we find (1.2) converges for —1 < x < <», and (2.3) converges for 
— oo < x < 1, and in the neighborhood of zero (2.2) equals (2.3). The region 
of equivalence of the series and the fraction may be extended by the following 
argument. Let the infinite series be terminated at some arbitrary point which 
gives the desired accuracy. Then the continued fraction of the corresponding 
type represents this finite series, is finite and gives the result within the desired 
accuracy. The new continued fraction may also be derived by use of the hyper¬ 
geometric series [9] p. 348. A special case of (2.3) was given by Markoff [3], 
pp. 135-41, [11] pp. 53-55, who applied the result only to the binomial distribu¬ 
tion. The associated continued fraction provides more rapid convergence than 
the corresponding continued fraction. The associated continued fraction is 
found by means of the hypergeometric series [9] p. 331, p. 348: 

T , . r( p + 9)3^0 - x y { _ ktx hx 2 _ f fox* 

* ’ r(p + i)r(g) \ 1 + 1 + igx+ 1 + igx-f 


h = 


(2.4) 


P_±J 

P+1’ 


1, = P + 


q 

p + 2 


&«-+! — 


L+i = 


- q)(p + s)(p + q + s) 

(p + 2s - l)(p + 2 s)Hp + 2 8 + 1) * 

&(qj- s) __ (p_ + _* + l)(p+_q + s + 2) 

(p + 2 8)(p + 2 8 + 1) (p + 28 + 2)(p + 28 + 1)" ’ 


ai. 


The disadvantage of (2.4) lies in the unwieldy form of computation. For prop¬ 
erties of an associated continued fraction and the corresponding continued 
fraction in connection with convergence and the Taylor series reference is made 
to [9] p. 331 and pp. 302-303. 


3. Properties of the corresponding continued fraction. Muller and Soper 
[5], [10], pointed out the inadvisability of integration through the mode x — 

—?-. In such cases we change I*(p, q) to Ii~ x (q y p ). Mtiller has shown 

p + q — 2 

for his continued fraction that if we do not integrate through the mode (we 
assume this in the remainder of the paragraph) that convergents 2, 3, 6, 7, etc., 
will be greater than the true value and the remaining convergents will be less 
than the true value provided q is an integer. However, if q is not an integer, 
and is small (q < 20), it may happen that all convergents are above the true 
value. In such cases we may consider whether Mtiller's continued fraction may 
apply by estimating the remainder I{p + 8, q — s), after s reductions by parts 
[ 10 ]. 

' For the new continued fraction also 

_jC q - «) p - 1 . 

(p + 2«- l)(p + 2s) p + q- 2 ’ 

(p + *)(p + g + »)(p - 1) . 

(p + 2s) (p + 2s + l)(p + q — 2) ’ 


ICWiI 
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and Cu+i < 0; Cu > 0 unless a > q when Cu <0. If Cu > 0 then the con- 
vergents 2, 3, 6, 7, 10, 11, etc., will be above the true value and the other con- 
vergents will be below the true value. If C it < 0, then all convergents will be 
above the true value. In such cases, since a remainder for the continued frac¬ 
tion has not been found, it seems best to estimate J,(p + «,? — «) to obtain 
an idea of the error. 


4. 7,(p + s, q — s) and the equivalent continued fraction. Soper [10] has 
given the remainder after s reductions by raising p. This will furnish an upper 
bound of the error in the corresponding continued fraction after a convergents. 
The remainder, when q — a is a negative integer, is approximately 


(4.1) 


7.(p + «,«-«) 


2 sin (q - «W{({ - l)/2r(p + q) 

1-Tar .. 



where $ = — ■ j 
P + g 


Another approach is to use the equivalent continued fraction, for s — 1 con¬ 
vergents of the equivalent continued fraction reproduces exactly 8 terms of the 
infinite series. The infinite series and the equivalent continued fraction for the 
infinite series are alike in all respects except form. By [9] p. 210, we find that 
the equivalent continued fraction for (2.3) is 


Wi = 


7i 


72 


78 


74 


1 + 7i“ 1 + 72“ 1 + 78““ 1 + 74~ 


where 


(4.2) 


and 


V + q 
71 V + 1 


_ V + Q + 1 

72 pT 2 ' 


y. = P + g + 2 

7 * p + 3 ’ 


_ P + q + r - 1 

7r — — jj . 

p + r 


i*(p, q) 


r(p + q)x”( 1 - x) q 1 

r(p + i)r(g) l - Wx 


The equivalent continued fraction for M tiller’s continued fraction is given in 
[5], p. 292. 


5. Numerical illustration. If A v and B v represent the numerator and the 


denominator of the p-th convergent of a continued fraction 


fll Os Q| 04 


then 


(5.1) 


*4. b v A . v —i 4“ .—2 

B v — b v B,-i + , 


v > 2. 
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As an example we calculate J.» (2.5, 1.5), which could not be done by Mailer's 
continued fraction. 


Convergent 

A 

B 

A/B 

1 

1 

1 

1 

2 

1 

.42857143 

2.3333333 

3 

1.015873016 

.44444444 

2.2857142 

4 

.66233767 

.29292929 

2.2610838 

5 

.64812966 

.28671329 

2.2605498 

6 

.46471308 

.20559441 

2.2603391 

7 

.441837914 

.195475117 

2.2603281 

8 

.33105492 

.14646345 

2.2603245 

9 

.30890766 

.13666520 

2.2603242 

10 

.23762461 

.10512856 

2.2603240 

11 

.21882154 

.096809808 

2.2603240 


Using the value of the eleventh convergent we have, J. 6 (2.5, 1.5) = .28779339. 
Pearson [7], p. 30, gives .2877934 and Soper [10], p. 32 gives .28779341. 

6. Discussion of the various methods. M ul ler’s continued fraction encounters 
difficulties when q is small due to the possible divergence of the series on which 
it is based. In such cases the new continued fraction works admirably. Where 
“reduction by parts” [10] is advisable it would seem Muller's results will be 
better, while if “integration raising p” is preferable, then the new continued 
fraction would be necessary. The other methods suggested in the past lacked 
in some cases remainder terms; were in other cases too long; were feasible only 
in a limited range; or were only approximations. I am particularly indebted 

to Professor C. C. Craig under whose guidance this study was completed. 

/ 
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NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items . 


NOTE ON THE DISTRIBUTION OP NON-CENTRAL t 
WITH AN APPLICATION 


By Cecil C. Craig 
University of Michigan 


If we adopt the notation recently used by N. L. Johnson and B. L. Welch [1], 
non-central t is defined by 


t - 


0 + 5 
\/w ’ 


in which 6 is a constant and z and w are independent variables, z being distributed 
normally about zero with unit variance and w being distributed as x If in which f 
is the number of degrees of freedom for x • 

In the paper referred to Johnson and Welch discuss some applications of 
non-central t and give suitable tables calculated from the probability integral 
of the distribution of this variable. Previously tables of this probability in¬ 
tegral for the purpose of calculating the power of the t test had been given by 
J. Neyman [2] and Neyman and B. Tokarska [3]. 

It is the purpose of this note to call attention to a series expansion for the 
probability integral of non-central t which is simple in form and in most cases 
convenient for direct calculation. As an application of some intrinsic interest 
this series is used to compute in several numerical cases the power of a test 
proposed by E. J. G. Pitman [4] based on the randomization principle. 

If for convenience we write, 

y/w = (0 < ^ < *>), 

we have for the joint distribution of z + 5 and 

(l) <*«• + «,«- JgrOT 

From this 


(2) 


df{t, *) 


2(//2y«e~* , « 

V2t r(// 2 ) 




r(//2) r—0 


^ ^Cbpdt, 
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Now this series can be integrated term by term with respect to over its range 
and we have, 


(3) 




(J/ 2 ) /lt e~ ttn ^ r[|Cf+ r + 1)] 

\/2t r(//2) r! 


(«) 


'(j 


+ <*, 


) (U - 


This series converges uniformly in any finite interval for t and it may be inte¬ 
grated term by term over the entire range for t or over any part of it. In 
particular, after some reduction, we get, 


(4) 


P(0<t<U |/, 5) - [“ d/(t ) 
Jo 


„-«*/» » 


Si 


in which J^(r + l)/2, Jt) is the incomplete Beta-function in the nota¬ 

tion of Karl Pearson. Often what is wanted is 


(5) P(- to < t < <.) - g ( -^ I ((r + l)/2, //2;^g). 

Since the incomplete Beta-function is numerically less than unity it is seen 
that the series (4) or (5) converges rapidly for moderate values of 5 such as will 
ordinarily occur in applications for small samples. The use of Pearson's tables 
of J(p, q; x) will be convenient since interpolation will be required for only one 
of the three arguments. 

As an application let us consider the test proposed by Pitman in the paper 
referred to above. Two independent samples, x x , x 2 , • • • , gc Nl , and y x , y *, • • • , 
y N% , have been drawn and it is desired in the absence of any information about 
the two populations from which the samples came to test the hypothesis that 
they have equal means. A test based on what may be termed the principle of 
randomization for this situation has been discussed by R. A. Fisher [5] and by 
E. S. Pearson [6]. It is as follows: Let the combined sample of N x + N s ob¬ 
servations be separated into sets of N x observations, u x , , • • • , u Ifl , and N* 

observations, v x , v* , • • • , , in all possible ways. For each such separation 

let the numerical difference of the means, | U — v |, be the spread. Then for a 
suitably chosen 6 > 0, we will reject the hypothesis of equal means if fewer than 
100a% of the n 1 ^n % Ck 1 spreads exceed | x — $ |, and otherwise not. It is clear 
that this test is fiduciary valid independently of the populations actually sampled 
in the sense that if it be consistently followed for all such samples, the proportion 
of cases when the hypothesis is rejected when it is true will statistically ap¬ 
proach a. 

For all but very small samples it is very tedious to calculate the N l +N t CN i 
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spreads and Pitman in his discussion shows that for quite moderate values of 
Ni and Ni the quantity, 


w — 


NiNt (jj _ 

(Ni + Ntj * ( } 


>3 


S(x ~ £)* ± 2(y - , NiNt 


Nt + N t 


+ 


(Ni + N t y 


(fi - v) 




f’ + f 3 


has a distribution which in all but very exceptional cases is quite well approxi¬ 
mated by a B(J, i(Ni + Nt — 2))-function. That is, the distribution of w for 
the K t +N t C f i l spreads may for practical purposes be found from that of t, by a 
simple transformation, with N\ + Nt — 2 degrees of freedom. 

It seems pertinent to make some inquiry into the power of such a test, that is, 
to make an attempt to learn something about the probability that such a test 
will fail to reject the hypothesis of equal means when it is in fact false. To do 
this it is now necessary to specify the populations which have actually been 
sampled. If we suppose that these populations are normal with equal variances 
but with unequal means which, with no loss of generality, may be taken to be ^ 
and —ft respectively, the probability integral of the distribution of non-central 
t will give our answer. 

If we set 


we have 

Also, 


f+t s ? + r 2 ’ 

t = Vft/t 


l)s* + (Nt - 1)4 Ni + Ni - 2 _ f 

* AT I AT O ’ AT AT AT l 


Ni + Ni-2 


Ni + N, 


Ni + Ni 




in which s* is the usual estimate of the population variance a* based on / = 
Ni + Nt — 2 degrees of freedom. Then 


= 1JA 

« V Ni 


NiN t 


+ Ni 


and this is a central t if fi = —fi — 0, otherwise it is non-central, 
case we write (the test is made on x — $), 


In the latter 


t 


(£ — lu) — (ff + m) + 2fi 


Ni + N, 


z + S 
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in which, 


and 


(f — /*) — ($ + m) 




NiN» 

+ N 


i = «A, 


2 m . / JViW. 

v V Ni + N*' 


In applying Pitman’s test for a given significance level a, one determines 
whether or not 

P(w > too) ^ a, 


wo being the value of w calculated from the sample. This is equivalent to finding 

> 4 ), 

for the proper /, in which 



and this can be found from an ordinary table of the probability integral of the 
^-distribution. 

For a numerical example let JVi = = 10 so that/ = 18. If we adopt a 5% 

significance level we have t\ — 2.101 2 for the critical value. Let us suppose that 
it/a = 0.1, and calculate the probability that the hypothesis that ft = 0 will be 
rejected. We have S — 0.1 and 


Then 


to 

/+<• 


0.1969. 


Pit 2 < 4) = e - * 1 [7(0.5, 9; 0.1969) + 0.17(1.5, 9; 0.1969) 


+ 


0.01 

2! 


7(2.5, 9; 0.1969) + .. •! 


= 0.9292. 


Four terms of the series were enough to give this result. The probability of 
rejecting the hypothesis in this case is thus 0.0708. 

The following tables show results for a = 0.05 and 0.01, ft/a — 0.1, 0.2, and 
0.5, and N\ =* Nt = 10 and 20. 
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Values of P(f > $) 
Nl ** N% » 10 


\m/<t 

a \ 

0.1 

0.2 

0.6 

0.05 

0.0708 

0.1355 

0.5621 

0.01 

0.0165 

0.0396 

0.2940 


Ni « Ni = 20 


\ M/<r 


1 


•\ 

0.1 

0.2 

0.6 

0.05 

0.0947 

0.2345 

0.8691 

0.01 

0.0251 

0.0862 

0.6730 


In only one case was it necessary to calculate as many as ten terms of the 
corresponding series to obtain these values. 
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NOTE ON AN APPLICATION OF RUNS TO QUALITY CONTROL CHARTS 

By Frederick Mosteller 
Princeton University 

In the application of statistical methods to quality control work, a customary 
procedure is to construct a control chart with control limits spaced about the 
mean such that under conditions of statistical control, or random sampling, the 
probability of an observation falling outside these limits is a given a (e.g., .05). 
The occurrence of a point outside these limits is taken as an indication of the 
presence of assignable causes of variation in the production line. Such a form 
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of chart has been found to be of particular value in the detection of the presence 
of assignable causes of variability in the quality of manufactured product. As 
recently pointed out, however, the statistician may not only help to detect the 
presence of assignable causes, but also help to discover the causes themselves in 
the course of further research and development. For this purpose, runs of 
different kinds and of different lengths have been found useful by industrial 
statisticians. 1 Quality control engineers have found, at least in research and 
development work, that a convenient indication of lack of control is the occur¬ 
rence of long runs of observations whose values lie above or below that of the 
median of the sample. For example (as will be shown below), at least one suc¬ 
cession of 9 or more observations above or below the median in a sample of 40 
would be taken as evidence of lack of control at the .05 level; meaning that 
under conditions of control such a run would occur in approximately 5 per cent 
of the samples. Since this type of test has been found useful by quality control 
engineers, it is perhaps desirable to discuss the mathematical basis of such tests 
of control and provide a brief table for samples of various sizes at the signifi¬ 
cance levels .05 and .01. 

The general distribution theory of runs of k kinds of elements, and in particular 
that of two kinds has been thoroughly investigated by A. M. Mood. 2 The 
purpose of this note is to give an application of the general method to quality 
control. 

Let us consider a sample of size 2n drawn from a continuous distribution 
function/(:r). These are then arranged in the order in which they were drawn. 
We now separate the sample into two sets by considering the nth and (n + l)st 
elements in order of magnitude, then if g x n , will be called an a, and if 
Xi g: x„+i, Xi will be called a b. A run of a's will be defined as usual as a suc¬ 
cession of a’s terminated at each end by the occurrence of a b (with the obvious 
exceptions where the run includes the first or last element of the sample), and 


1 The use of “runs up” and “runs down” as well as runs above and below the arithmetic 
mean of a sample were briefly described in a paper by W. A. Shewhart, “Contribution of 
statistics to the science of engineering,” before the Bicentennial Celebration of the Uni¬ 
versity of Pennsylvania, September 17, 1940, to be published in the proceedings of that 
meeting. In a paper, “Mathematical statistics in mass production,” presented before the 
.American Mathematical Society in February, 1941, Shewhart discussed some of the ad¬ 
vantages of using runs above and below the median and showed how by comparing runs of 
different types in a given problem it is often possible to fix rather definitely the source of 
trouble. The present note considers only the frequency of occurrence of “long” runs which 
are often used by research and development engineers to indicate the presence of assignable 
causes of variation. The occurrence of more than one such run in a given sequence, if dis¬ 
tributed above and below the median value may also constitute valid evidence of the 
presence of more than one state of statistical control between which the phenomena may 
oscillate. The interpretation of long runs in this sense, however, is not considered in the 
present note. 

1 A. M. Mood, “The distribution theory of runs,” Annals of Math . Stat ., Vol. 11 (1940), 
pp. 367-392. 
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runs of b ’s are defined similarly. A run of o’s may conveniently be called a 
run “below the median,” and a run of b’a a run “above the median.” 

We shall use Mood’s notation throughout, i.e., , (i = 1, 2, • • • , n) are 

the number of runs of o’s and b’a respectively of length t, and rj, r* are the total 


number of runs of o’s and b’a; 



will indicate a multinomial coefficient, and 


a binomial coefficient. 


Also we define 


F(ri , r*) = 0, | n - r, | > 1, 

F(ri , r») = 1, | n - r* | * 1, 

F(r i, r 2 ) = 2, | n - r* | = 0. 


Then the distribution of runs of o’s for our case is 


n (n + l\ 



We would like to find the probability of at least one run of s or more o’s. The 
coefficient of x n in 

(2) [x + x 2 + • • • + x’ '] r \ 


gives the number of ways of partitioning n elements into r x partitions such that 
no partition contains 8 or more elements, and none is void. Rewriting (2) we 
have 


* Tl [(l - x- l )] r ' 



and the coefficient of x n is just 


(3) 


t 

>-0 


‘-"'OX’"*:? 



Then the probability that we desire, of getting at least one run of s or more a’s 
is immediately given by 


P(r M £ 1, i £ *) 




- 1 - j(8 - 1) 


)]cr) 


(n) 
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Noting that when j => 0 in the inner summation we have just the total number 
of partitions, we get finally 


(4) P(r u £ 1, * «) 


»-»+! 





A similar result of course holds for the b’s. 

If we desire the probability of getting at least one run of s or more of either 
a’s or b’s, we compute the probability of getting no runs of this type and sub¬ 
tract from unity. Expression (3) multiplied by the total number of ways of 
getting no partitions of s or more b’s for a given r t , and then summed on ri 
gives exactly the number of ways of getting no runs of either a’s or b’a as great 
as 8. This is 


(5) 


{;i>.4 ( -4?X n 44 ,_1> )]- 


and the probability desired is 

(6) P(ru S 1 or rat1 or both; i s) = 1 — A j 


In spite of the complex appearance of A, the sum can be rapidly calculated for 
any given «, n since the calculations for the sums on i and j need not be duplicated. 

In the case of a quality control chart, we set a significance level a for a given n, 
this determines 8 the length of run of either type necessary for significance at the 
level chosen. Suppose we are interested only in runs occurring on one side of 
the median, say above, when a = .05, n = 20 (i.e., sample size equal to 40). 
We determine the least value of 8 which will make the right hand side of equa¬ 
tion (4) less than or equal to .05. It turns out that s = 8 for this case. This 
means that under conditions of statistical control, i.e., random sampling, one or 
more runs of length 8 or more, above the median will occur in approximately 
5 per cent of samples of size 40. Naturally an identical result holds when we 
are considering only runs below the median. 

On the other hand, if under the same conditions as given above (n = 20, 
a = .05), we are using as our criterion of statistical control the occurrence of 
runs of length 8 or greater either above or below the median, we must determine 

the least value of 8 such that 1 — < .05. This value turns out to be 9. 

In other words under conditions of statistical control at least one run of at least 9 
will occur either above or below the median in less than 5 per cent of the cases 
on the average. 
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The following table gives smallest lengths of runs for .05 and .01 significance 
levels for samples of size 10, 20, 30, 40, 50. 



Runs on one side of median 

Runs on either side of median 

2n 

a - .05 

a ■■ .01 

a - .05 

a - .01 

10 

5 

— 

5 

— 

20 

7 

8 

7 

8 

30 

8 

9 

8 

9 

40 

8 

9 

9 

10 

50 

8 


10 

11 


If there is an odd number of individuals, say 2n + 1, in the sample, we would 
choose the value of the median as the dividing line for our sample and treat the 
data as if there were only 2 n cases, thus ignoring the median completely. 

The following table 8 gives the probabilities of getting at least one run of 8 
or more on one side, either side, and each side of the median for samples of size 10, 
20, and 40. 


Length 

of 

Run (s) 

One 

Side 

2n - 10 
Either 
Side 

Each 

Side 

One 

Side 

2n - 20 
Either 
Side 

Each 

Side 

One 

Side 

2n - 40 
Either 
Side 

Each 

Side 

1 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

2 

.976 

.992 

.960 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3 

.500 

.667 

.333 

.870 

.956 

.784 

.992 

.999 

.986 

4 

.143 

.230 

.056 

.457 

.640 

.274 

.799 

.930 

.668 

5 

.024 

.040 

.008 

.178 

.293 

.064 

.450 

.650 

.249 

6 




.060 

.106 

.013 

.207 

.346 

.068 

7 




.017 

.032 

.002 

.087 

.158 

.016 

8 




.004 

.007 

.000 

.034 

.065 

.004 

9 




.001 

.001 

.000 

.013 

.025 

.001 

10 




.000 

.000 

.000 

.005 

.009 

.000 

11 







.002 

.003 

.000 

12 







.000 

.001 

.000 

13 







.000 

.000 

.000 


One method of computing such a table is to use expression (4) to obtain the 
probabilities on one side, and to use (6) to get probabilities for either side. 
Then the probabilities for runs on each side may be computed by using the 
relationship 

2 P (one side) — P (either side) = P (each side). 

■The author is indebted to Dr. P. S. Olmstead of the Bell Telephone Laboratories 
for kindly placing this table at his disposal. Dr. Olmstead has pointed out that these 
probabilities have been found very useful in research and development work. 
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TEST OF HOMOGENEITY FOR NORMAL POPULATIONS 


By G. A. Baker 
University of California 

1. Introduction. In biological experiments it is often of interest to test 
whether or not all the subjects can be regarded as coming from the same nornial 
population. If they have not come from the same normal population, usually 
the most plausible alternative is that the subjects have come from a population 
which is the combination of two or more normal populations combined in some 
proportions. The combination of normal populations is a “smooth” alternative 
to the hypothesis of a single normal population. Such non-horaogeneous popu¬ 
lations are not the only “smooth” alternatives, of course, but are included 
. among the “smooth” alternatives. If there is reason to believe that the only 
deviation from a normal population is due to non-homogeneity, then the results 
of Professor Neyman in his paper [1] are available in studying this problem. 

It is desirable not to make any hypotheses about the mean and standard 
deviation of the sampled population, but to base all computations and tests on 
the data contained in the sample. Such a viewpoint has been stressed in a 
previous paper [2] where it was shown that if the sampling is from a normal 
population, the probability of a deviation from the mean of a first sample of n 
measured in terms of the standard deviation of the sample is proportional to 


(i.i) 


dv 


( 


i + 


t^y 72 ' 

n+l) 


The result (1.1) and Neyman’s results give rise to a test of homogeneity whieh 
is valid for “large” samples. Empirical results show that fairly conclusive evi¬ 
dence of non-homogeneity may be obtained with samples of 100. Samples of 50 
or less may be suggestive but rarely decisive. 


2. Development of Test. Suppose that a sample of n + 1 is drawn from a 
normal population. It can be regarded as being made up of a first sample of n 
and a second sample of one. The value of v corresponding to (1.1) can then be 
computed and its distribution function is (1.1). This partition, of course, can 
be made in n + 1 ways. That is, n + 1 values of v are determined from a 
random sample of n + 1 from the original parent. It is true that these values 
of v are not independent among themselves. The correlation between the values 
of v, to a first approximation at least, is of the order of 1/n and can be neglected 
if n is “large.” 

A suitable transformation as discussed in [3], [1] and elsewhere, transforms 
(1.1) into a rectangular distribution. 

If the same computations are made when the sampled population is not 
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normal, then the resulting values obtained will not be rectangularly distributed. 
For instance, suppose that the sampled population is 


( 2 . 1 ) 


*\/2v 


f(x) - -4= 


we find that the distribution of v based on the first sample of 2 is a very com¬ 
plicated expression involving sums of exponentials and definite integrals of expo¬ 
nentials. To obtain a rectangular distribution if the sampled population is 
normal, the appropriate transformation to make is 

v = — y/Z cot tu 

( 2 . 2 ) , /~ * , 

dv = v 3 t esc ru du. 

The resulting n-distribution for population (2.1) then is to be compared with 
the rectangular distribution in the interval from zero to one. 

For “large” values of n + 1 and for symmetrical non-homogeneous popula¬ 
tions composed of two normal components, the li-distribution will be sym¬ 
metrical about u = less than one near the ends, greater than one for values 
of u moderately far from J and less than one for values of u near £. A Neyman 
[1] of order 4 will be necessary to detect a difference of this sort. If the 
non-homogeneous population of two components is skewed, the u-distribution 
will still show the same two-humped effect but may be skewed instead of sym¬ 
metrical. A Neyman of order 4 should still be computed, although ^5 may 
be more significant. 

The test then consists of: 

(a) computing the n + 1 quantities 

(2.3) (* = 1,2,3, ...,«+l) 

vn + 1 8 

where 


n + 1 ~ number in the sample 
Xi — the observed values 


xj — the observed values except 


x 


i n 

fl ;-l 


« s - - £ (ay - if 

n j-i 


(b) making the transformation 


£ 


yodx' 


(1 + *'*)"« ’ 


(* - 1 , 2 , 3 , ...,»+ 1 ) 


(c) computing the first four s of Neyman’s paper [1] 

(d) comparing with ¥?(&) as found from the Incomplete Gamma Function 
Tables. 
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If n is large, say n = 100, then u is given approximately by the normal 
probability integral. 

If n is small, the values of u are obtained from the Table 25 of Vol. 2 of 
Pearson's Tables. 

Neyman's derivation assumes that n + 1 is large and that the w's are inde¬ 
pendent. In this case, if n + 1 is large, then the u 1 s are nearly independent, 
and hence the test is valid. The same procedure can be applied for smaller 
samples. It can not be expected that small differences from normal in the 
sampled population can be detected with small samples. Empirical results 
indicate that samples of 100 are necessary for decisive results even when the 
differences of the sampled population from a normal homogeneous population 
are large. Samples of 50 may be suggestive and in very extreme cases might be 
decisive. 


TABLE I 


Empirical Sampling Remits 



k - 1 

k — 2 

k - 3 

k mm 4 

'i'J's for 51 from population A. 

.0001 

.843 

2.009 

7.464 

¥*'s for 101 from population A. 

.086 

2.403 

4.998 

12.868 

tyj's for 101 from population B. 

.553 

.927 

7.472 

7.485 

*i’s for 101 from normal. 

.017 

.082 

1.288 

1.663 

¥(.om(*)’s (Neyman [1]). 

3.842 

5.992 

7.815 

9.488 

'l'(.oi) (&)’s (Neyman [1]).. 

6.635 

9.210 

11.345 

13.277 


It is to be noted that the test makes no assumption about the parameters of 
the sampled population and docs not group the data. The application of the 
test gives a unique result that does not depend on the judgment of the computer 
in any respect. In applying the usual chi-square test the computer must choose 
groupings. The choice of groupings as indicated in [5] may change the P-values 
to very different levels of significance. 

3. Empirical results. Samples of 51 and 101 from population A , of 101 from 
population B } and of 101 from a normal population, were drawn by throwing 
dice. Populations A and B are given in [4]. Population A is symmetrical and 
distinctly bimodal. Population B is weakly bimodal and strongly skewed. 

For samples from population A it is necessary to compute . For samples 
from population B it may be sufficient to compute . The non-homogeneity 
of the type of population A seems to be somewhat more detectable than of the 
type of population B. The sample from the normal parent shows close con¬ 
formity with expectation. 

In applying the proposed test for homogeneity the u-values for small inde¬ 
pendent sets of data can be combined to give a much larger number of u-values. 
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A NOTE ON THE POWER OF THE SIGN TEST 

By W. Mac Stewart 
University of Wisconsin 

1. Introduction. Let us consider a set of N non-zero differences, of which z 
are positive and N ~ x are negative; and suppose that the hypothesis tested, 
Ho , implies, in independent sampling, that x will be distributed about an ex¬ 
pected value of iV/2 in accordance with the binomial (§ + i)*. As a quick 
test of Ho , we may choose to test the hypothesis ho that x has the above proba¬ 
bility distribution. Defining r to be the smaller of x and N — x, the test con¬ 
sists in rejecting ho and therefore H 0 whenever r <£ r(e, N ), where r(«, N ) is 
determined by N and the significance level c. 

2. Power of a test. In applying such a test it is of interest to know how 
frequently it will lead to a rejection of Ho when Ho is false and the situation H 
implies that the probability law of x is (q + p) N , with p thereby indicating 
an expectation of an unequal number of + and — differences. The proba¬ 
bility of rejecting H 0 when Hi implying p « p\ is true, is termed the power of 
the test of Ho relative to the alternative Hi } Thus, from the point of view of 
experimental design the power (P) of the test of Ho may be considered a func¬ 
tion of the alternative hypothesis H\ , the significance level e, and N. As such, 
the following observations may be noted: 

1. The power P 2 , for an assumed e, N, and H 2 implying p = p* is greater 
than or equal to the power Pi for e, N and Hi implying p = pi where 
| pt - .50 | > | pi - .50 |. 

1 For an extensive discussion of the power of a test, the reader is referred to J. Ney- 
raan and E. S. Pearson, Statistical Research Memoirs, Vol. 1 (1936), pp. 3-6. 
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2. The power P* for an assumed Hi , N , and es, is greater than or equal to the 
power Pi for H x , N, and «i, where «* > «i. 

3. The power P% for an assumed H x , €, and N% is greater than or equal to the 
power Pi for H x , «, and N x where N% > Ni . 

Hence, to increase the power of the test of Ho relative to a particular H i, 
the methods implied in observations 2 and/or 3 may be employed. However, 
if any increase in an established e is undesirable, the method implied in observa¬ 
tion 3 is the alternative. 

3. Explanation of table. In the interests of efficiency and economy, two 

questions then arise: (1) What is the minimum value of N f which, at the signifi¬ 
cance level c, will give the test of Ho a power P > 0, relative to a particular 
alternative hypothesis H x ? (2) For this minimum value of N corresponding 
to «, what is the maximum value of r? Stated in another manner, the questions 
are these: “What is the smallest number (min N) of paired samples that must 
be employed in conjunction with the Sign Test in order that the test of Ho., 
at the significance level c, shall have a power P > 0 relative to an alternative 
hypothesis H x ?” (2) If x of these paired samples give rise to a positive differ¬ 

ence, and (min N — x) a negative difference, and if r be defined as the smaller 
of x and (min N — x) ; then, what is the maximum value that r may attain and 
still have the results, at the level e, judged significant? 

Table I provides the answers to these questions for the significance level 
€ :§ .05; and (1) for H x implies p = p x for values of p x from .60 to .95 (and 
thereby from .40 to .05) at intervals of .05; (2) for values of from .05 to .95 
at intervals of .05, and also for £ > .99. For example, assume that a power 
P > .80 relative to the alternative hypothesis Ho (p x = .70) is desired. In 
Table I, the entry appearing in the column headed Ho (p x = .70), and in the 
row P > .80 is 49,17—indicating that 49 paired samples are required, of which 
17 or less must be of one sign (+ or —) and hence 32 or more must be of the 
opposite sign in order that the results be significant at the .05 level. 

Because of the discreteness of the binomial distribution, it is impossible to 
maintain the level of significance at .05 or even "arbitrarily close to that figure 
and still hold to the criterion that N shall be at a minimum. For that reason, 
particularly when min N is small, results significant at .05 according to Table I 
may be significant at a level e f where is considerably less than .05. In general, 
however, and in particular when min N is large (greater than 50) both the 
quantities (.05 — «') and (P — $) are small. 

4. Illustrative example. Goulden 8 describes a simple experiment in identi¬ 
fying varieties of wheat. In this experiment, a wheat “expert” is presented 
paired grain samples of two particular varieties of wheat. The object of the 

•C. H. Goulden, Methods of Statistical Analysis , John Wiley and Sons, New York, 1980, 

p. 2. 
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experiment is to test .the ability of the expert to differentiate between the two 
varieties by arranging the pairs so that samples of one variety are on the left, 
say, and samples of the other variety are on the right. 

In a problem of this type, it is desirable to have a sufficiently large number, N, 
of paired samples in order that the following conditions be fulfilled: (1) The 
probability that a person possessing no discriminating ability pass the test 

TABLE I 

Minimum number of paired samples and maximum values of related r 

H o po = .50 


(5% level of significance, i.e., t < .05) 
(min N, max r) 


POWXB 


H, 

Pi *■.95 

Hr 

pi*.90 

pi m .85 

H, 

P 1 -. 8 O 

H, 

pi-.75 

Hi 

pi-.70 

Hi 

P 1-.66 

Hi 

pi —.00 

0 < P < 

.05 

— 

— 

— 

— 

— 

— 

7,0 

6,0 

p > 

.05 

— 

— 

— 

— 

— 

7,0 

6,0 

9,1 

p > 

.10 

— 

— 

— 

— 

7,0 

6,0 

9,1 

17,4 

p > 

.15 

— 

■— 

— 

8,0 

6,0 

9,1 

12,2 

25,7 

p > 

.20 

— 

— 

— 

7,0 

10,1 

13,2 

17,4 

37,12 

p > 

.25 

— 


8,0 

6,0 

14,2 

12,2 

23,6 

44,15 

p > 

.30 

— 

— 

7,0 

11,1 

9,1 

18,4 

25,7 

56,20 

p > 

.35 

- 

— 

6,0 

10,1 

12,2 

17,4 

30,9 

65,24 

p > 

.40 

— 

8,0 

— 

9,1 

16,3 

20,5 

35,11 

74,28 

p > 

.45 

— 

7,0 

11,1 

— 

15,3 

26,7 

42,14 

89,35 

p > 

.50 

— 

6,0 

10,1 

13,2 

18,4 

25,7 

44,15 

101,40 

p > 

.55 

— 

— 

9,1 

12,2 

17,4 

30,9 

51,18 

112,45 

p > 

.60 

— 

— 

14,2 

15,3 

20,5 

36,11 

56,20 

125,51 

p > 

.65 

7,0 

11,1 

13,2 

19,4 

23,6 

35,11 

63,23 

143,59 

p > 

.70 

6,0 

10,1 

12,2 

18,4 

25,7 

40,13 

67,25 

158,66 

p > 

.75 

— 

9,1 

16,3 

17,4 

28,8 

44,15 

79,30 

175,74 

p > 

.80 

— 

14,2 

15,3 

20,5 

30,9 

49,17 

90,35 

199,85 

p > 

.85 

11,1 

12,2 

18,4 

25,7 

35,11 

56,20 

101,40 

227,98 

p > 

.90 

9,1 

15,3 

17,4 

28,8 

42,14 

65,24 

114,46 

263,115 

p > 

.95 

12,2 

17,4 

23,6 

35,11 

49,17 

79,30 

143,59 

327,145 

p > 

.99 

15,3 

23,6 

30,9 

44,15 

67,25 

110,44 

199,85 

453,205 


through sheer guesswork be less than e; and (2) if past experience has proven 
that an expert does possess the ability to discriminate between the varieties to 
the extent of placing a proportion, pi, of the pairs correctly in the long run, 
then the probability that he will pass the test be P. 

Under these conditions, how large an N is required, and for that AT, what is 
the maximum number of pairs that may be incorrectly placed without failing 
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the test? For alternative hypothesis H 4 (pi » .75), and for P > .90, referring 
to Table I, it is seen that 42 paired samples must be, employed and not more 
than 14 may be placed incorrectly. Under the same alternative hypothesis, if 
it be required merely that P > .50 (i.e., an expert with an ability of .75 have 
better than an even chance of passing), then only 18 paired samples are necessary 
and not more than 4 may be arranged incorrectly. 

Thus, before conducting an experiment in which the Sign Test is to be em¬ 
ployed, if the experimenter first decides what power the test must have relative 
to a certain alternative hypothesis; then from the accompanying table he may 
learn the minimum number of paired samples that are necessary; and the related 
maximum value of r. 

If this procedure is not followed, and an experimenter employs, say 6 paired 
samples, he may (as can be seen from the table) discover, to his dismay, that 
“experts” of ability .75 will be unrecognized more than 80% of the time. 


MOMENTS OF THE RATIO OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE MEAN SQUARE DIFFERENCE IN 
SAMPLES FROM A NORMAL UNIVERSE 

By J. D. Williams 
Phoenix , Arizona 

The following result may have considerable application to trend analysis. 
The specific problem was proposed to me by R. H. Kent. 

Consider a sample 0 n : X \, X *, • • • , X n from a normal population with zero 
mean and variance the variates being arranged in temporal order. We seek 
the moments of the ratio of h 1 to S 2 , where 

(1) (» - D * 2 - Z (x, - x m ? 

y«i 

and 

(2) n& • Z (X, - Xf. 

i -1 

Here X is the mean of the X,. In order to simplify the algebra, we will work 
with quantities A and B defined by 

2a 1 A = (n — 1)5*, 

(3) 

2 SB = nS 3 . 

The characteristic function for the joint distribution of A and B is 
vitt, U) - E(e At ' +Bt *) 

(vtJ //•••/ exp ( A<i+s<, “^S x? )H <ar# ' 


(4) 
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where <1 and It are pure imaginaries. For the method of analysis which will be 
used here t\ and fe will be considered as real variables. By straight forward 
methods we have 

a b d 

b c b d 

d b c b d 

I 

(5) «r\h, h) = ^ 

• d b c b d 

. d b c b 

d . d b a 

where the determinant is of nth order and its elements are 


( 6 ) 


a = 1 — — (n — 1)7’ 

• ,.:,i b — ti + T 

c — \ — 2t\ — {n — 1 )T 
d — T — tt/n. 

It can be verified that the determinant has the value 


(7) 


j-o \ 3 / 


U) 


n-j~i 


where the symbol C" 7 ■ ■) represents a binomial coefficient. From (7) we 
find the moments my of A/B as follows: Setting 


( 8 ) 

we have 

(9) 


i fat 


1 


my 


//•••/ 


d*<p(ti, tt) 


dt{ 


ti dhk 

< 1*0 b~ 1 


(n - l)(n + 1) • • • (n + 2j - 3)' 


The result is rather unexpected, for we have established that the moments of 
A/P are equal to the moments of A divided by the moments of B. 
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We find the following explicit values for the first few moments m,: 
mo * 1 
mj = 2 

(10) (n — 1)(« + l)mj = 4(n* + » — 3) 

(w — l)(n "I - l)(n ■+- 3)m» — 8(n* -}- 6n J -j- 2 n — 21) 

(n - 1)(» + 1)(» + 3)(n + 5)m 4 = 16(n 4 + 14n 3 + 53n J - 8n - 231). 

These are valid subject to the restriction 2n — 1 > j, because in arriving at the 

explicit forms we have treated the binomial coefficient (^) as if it were iden¬ 
tically equal to k(k — 1) • • • (k — j + 1 )/j\. 

From (10) it is easy to pass to the moments of ft = ?/S 2 . For example, we 
find the mean value and variance of ft to be 

2n 

n — 1 
and 

4n 2 (n — 2) 

(» + 1)(» - l) s 

respectively. 




ON THE INTEGRAL E<JlJXTlON OF RENEWAL THEORY 

By Willy Feller 
Brown University 

1. Introduction. In this paper we consider the behavior of the solutions of 
the integral equation 

(1.1) u(t) = git) + £ u(t - x)f(x)dx , 

where f(t) and git) are given non-negative functions. 1 This equation appears, 
under different forms, in population theory, the theory of industrial replacement 
and in the general theory of self-renewing aggregates, and a great number of 
papers have been written on the subject. 2 Unfortunately most of this literature 
is of a heuristic nature so that the precise conditions for the validity of different 
methods or statements are seldom known. This literature is, moreover, abun¬ 
dant in controversies and different conjectures which are sometimes supported 
or disproved by unnecessarily complicated examples. All this renders an ori¬ 
entation exceedingly difficult, and it may therefore be of interest to give a 
rigorous presentation of the theory. It will be seen that some of the previously 
announced results need modifications to become correct. 

The existence of a solution u{t) of (1.1) could be deduced directly from a well- 
known result of Paley and Wiener [21] on general integral equations of form 

(1.1) .* However, the case of non-negative functions fit) and g(t) f with which 
we are here concerned, is much too simple to justify the deep methods used by 
Paley and Wiener in the general case. Under the present conditions, the exist¬ 
ence of a solution can be proved in a simple way using properties of completely 
monot^ie functions, and this method has also the distinct advantage of showing 
some properties of the solutions, which otherwise would have to be proved 
separately. It will be seen in section 3 that the existence proof becomes most 
natural if equation (1.1) is slightly generalized. Introducing the summatory 
functions 

(1.2) U(t) = [ u(x)dx , F(t) = [ fix) dx, Git) = [ gix)dx, 

Jo Jo Jo 

1 For the interpretation of the equation cf. section 2. 

* Lotka’s paper [8] contains a bibliography of 74 papers on our subject published before 
1939. Yet it is stated that even this list “is not the result <?f an exhaustive search.” At 
the end of the present paper the reader will find a list of 16 papers on (1.1) which have 
appeared during the two years since the publication of Lotka’s paper, 
has been remarked also by Hadwiger [3]. 
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equation (1.1) can be rewritten in the form 

(1.3) U(t) = 0(t) + £ V{t - x) dF(x). 


However, (1.3) has a meaning even if F(t) and 0(t) are not integrals, provided 
F(t) is of bounded total variation and the integral is interpreted as a Stieltjes 
integral. Now for many practical applications (and even for numerical calcula¬ 
tions) this generalized form of the integral equation seems to be the most 
appropriate one and, as a matter of fact, it has sometimes been used in a more or 
less hidden form (e.g., if all individuals of the parent population are of the same 
age). Our existence theorem refers to this generalized equation. 

We then turn to one of the main problems of the theory, namely the asymptotic 
behavior of u(t) as t —► *>. It is generally supposed that the solution u(t) 
“in general” either behaves like an exponential function, or that it approaches 


in an oscillating manner a finite limit q ; the latter case should arise if / f(t) dt = 1, 

Jq 

thus in particular in the cases of a stable population and of industrial replace¬ 
ment. However, special examples have been constructed to show that this is 
not always so. 4 In order to simplify the problem and to get more general condi¬ 
tions, we shall first (section 4) consider only the question of convergence in mean, 
that is to say, we shall study the asymptotic behavior not of u(t) itself but of 

1 f* 

the mean value u*(t) = - / u(x) dx. The question can be solved completely 

t Jo 

using only the simplest Tauberian theorems for Laplace integrals. Of course, 
if u(t) —► q then also u*(t) —► q, but not conversely. The investigation of the 
precise asymptotic behavior of u(t) is more delicate and requires more refined 
tools (section 5). 

Most of section 6 is devoted to a study of Lotka’s well-known method of 
expanding u(t) into a series of oscillatory components, and it is hoped that this 
study will help clarify the true nature of this expansion. It will be seen that 
Lotka's method can be justified (with some necessary modifications) even in 
some cases for which it was not intended, e.g., if the characteristic equation has 
multiple or negative real roots, or if it has only a finite number of roots. On 
the other hand limitations of the method will also become apparent: thus it 
can occur in special cases that a formal application of the method will lead to a 
function u(t) which apparently solves the given equation, whereas in reality it 
is the solution of quite a different equation. 

Of course, most of the difficulties mentioned above arise only when the func¬ 
tion f(t) has an infinite tail. However, it is known that even computational 
considerations sometimes require the use of such curves, and, as matter of fact, 


4 Cf. Hadwiger [2] and also Hadwiger, “Zur Berechnung der Erneuerungsfunktion nach 
einer Formol von V. A. Kostitzin,” Mitt. Verein. achweizeriocher Versick.-Math., Vol. 34 
(1937), pp. 37-43. 
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exponential and Pearsonian curves have been used most frequently in connec¬ 
tion with (1.1). It will be seen that even in these special easescustomary 
methods may lead to incorrect results. Besides, our considerations show how 
much the solution u(t) is influenced by the values of f(t) for tr+ *>, and, accord¬ 
ingly, that extreme caution is needed in practice. The last section contains 
some simple remarks on the practical computation of the solution. 

2. Generalities on equations (1.1) and (1.3). This section contains a few 
remarks on the meaning of our integral equation and on an alternative form 
under which it is encountered in the literature. A reader interested only in the 
abstract theory may pass immediately to section 3. 

Equation (1.1) can be interpreted in various ways; the most important among 
them are the following two: 

(i) In the theory of industrial replacement (as outlined in particular by Lotka), 
it is assumed that each individual dropping out is immediately replaced by a 
new member of zero age. f(t) denotes the density of the probability at the 
moment of installment that an individual will drop out at age t The function 
g(t) is defined by 

(2.1) g(t) « f n(z)f(t - x) dx } 

where i?(x) represents the age distribution of the population at the moment 
t = 0 (so that the number of individuals of an age between x and x + 6x is 
rj(x)8x + o(&c)). Obviously g(t) then represents the rate of dropping out at 
time t of individuals belonging to the parent population. Finally, u{t) denotes 
the rate of dropping out at time t of individuals of the total population. Now 
each individual dropping out at time t belongs either to the parent population, 
or it came to the population by the process of replacement at some moment 
t — x (0 < x < 0, and hence u(t) satisfies (1.1). It is worthwhile to note that 
in this case 

(2.2) = 

since f(t) represents a density of probability. 

(ii) In population theory u(t) measures the rate of female births at time t > 0. 
The function /(<) now represents the reproduction rate of females at age t (that 
is to say, the average number of female descendants bom during (t, t + St) 
from a female of age t is f(t)St + o(St)). If i)(x) again stands for the age distri¬ 
bution of the parent population at t = 0, the function g(t) of (2.1) will obviously 
measure the rate of production of females at time t by members of the parent 
population. Thus we are again led to (1.1), with the difference, however, that 
this time either of the inequalities 
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may occur; the value of this integral shows the tendency of increase or decrease 
in the total population. 

Theoretically speaking, f(t) and g(t) are two arbitrary non-negative functions. 
It is true that g(t) is connected with/(0 by (2.1); but, since the age distribution 
ij(x) is arbitrary, g(t) can also be considered as an arbitrarily prescribed function. 

It is hardly necessary to interpret the more general equation (1.3) in detail: 
it is the straightforward generalization of (1.1) to the case where the increase or 
decrease of the population is not necessarily a continuous process. This form 
of the equation is frequently better adapted to practical needs. Indeed, the 
functions /(/) and g(t) are usually determined from observations, so that only 
their mean values over some time units (years) are known. In such cases it is 
sometimes simpler to treat f(t) and g{t) as discontinuous functions, using 
equation (1.3) instead of (1.1). For some advantages of such a procedure 
see section 7. It may also be mentioned that the most frequently (if not the 
only) special case of (1.1) studied is that where g(t) = f(t). Now it is apparent 
from (2.1) that this means that all members of the parent population are of 
zero age: in this case, however, there is no continuous age-distribution rj(x). 
Instead we have to use a discontinuous function rj(x) and write (2.1) in the form 
of a Stieltjes integral. Thus discontinuous functions and f5tieltjes integrals 
present themselves automatically, though in a somewhat disguised form, even 
in the simplest cases. 

At this point a remark may be inserted which will ^ove useful for a better 
understanding later on (section 6). In the current literature we are frequently 
confronted not with (1.1) but with 


(2.4) 



u{t — x)f(x) dx , 


together with the explanation that it is asked to find a solution of (2.4) which 
reduces, for t < 0, to a prescribed function h(t). Now such a function, as is 
known, exists only under very exceptional conditions, and (2.4) is by no means 
equivalent to (1.1). The current argument can be boiled down to the following. 
Suppose first that the function g(t) of (1.1) is given in the special form 

(2.5) g(t) = h(t — x)f(x) dx, 


where h(x) is a non-negative function defined for x < 0. Since the solution 
u(t ) of (1.1) has a meaning only for t > 0, we are free to define that u(—t) = 
h(—t) for t > 0. This arbitrary definition, then, formally reduces (1.1) to (2.4). 
It should be noted, however, that this function u(t) does not, in general, satisfy 
(2.4) for l < 0, for h(t) was prescribed arbitrarily. Thus we are not, after all, 
concerned with (2.4) but with (1.1), which form of the equation is, by the way, 
the more general one for our purposes. If thfere really existed a solution of 
(2.4) which reduced to h(t) for t < 0, we could of course define g(t) by (2.5) and 
transform (2.4) into (1.1) by splitting the interval (0, ») into the subintervals 



RENEWAL THEORY 


247 


(0, t) and (l, ao), However, as was already mentioned, a solution of the required 
kind does not exist in general. It will also be seen (section 6) that the true 
nature of the different methods and the limits of their applicability can be under¬ 
stood only when the considerations are based on the proper equation (1.1) and 
not pn (2.4). 

3. Existence of solutions. 

Theorem 1. Let F(t) and G(t) be two finite non-decreasing functions which 
are continuous to the right*. Suppose that 

(3.1) F( 0) = <7(0) = 0, 
and that the Laplace integrals 6 

(3.2) **«) = [ e~“dF(t), 7 (s) = f e~“ dG(t) 

Jo Jo 

converge at leant for s > a > 0 7 . In case that lim <p(s) > 1, let <r f > cr be the root 8 
of the characteristic equation <p(s) = 1; in case lim <p(s) < 1, put a' = <r. 

s —►tf-M) 

Under these conditions there exists for t > 0 one and only one finite non-decreasing 
function U(t) satisfying (1.3). With this function the Laplace integral 

(3.3) m 


6 It is needless to emphasize that this restriction is imposed only to avoid trivial am¬ 
biguities. 

6 The integrals (3.2) should be interpreted as Lebesgue-Stieltjes integrals over open 
intervals; thus 

vis) -» lim f e~ §t dF(t ), 

«-+o J, 

which implies that <p(s) 0 as 8 —► «. Alternatively it can be supposed that F(t) and 

< 7(0 have no discontinuities at t ** 0. Continuity of Fit) at t * 0 means that there is no 
reproduction at zero age. This assumption is most natural for our problem, but is by no 
means necessary. In order to investigate the case where F(t) has a saltus c > 0 at t =■ 0, 
one should take the integrals (3.2) over the closed set [0, *> ], so that 

v>(«) — c + lim f e~ s *dF(t). 

«-+o J, 

It is readily seen that Theorem 1 and itB proof remain valid if 0 < c < 1. However, if 
c > 1, then (1.3) plainly has no solution U(t). The continuity of G(t) at l — 0 is of no 
importance and is not used in the sequel. 

7 The condition is formulated in this general way in view of later applications (cf., e.g., 
the lemma of section 4). In all cases of practical interest a * 0. 

8 <p(s) is, of course, monotonic for <r and tends to zero as s —► ». In order to ensure 
the existence of a root of ?(«) ■= 1, it is sufficient to suppose that the saltus c of F(t) at t ® 0 
is less than 1 (cf. footnote 6). 
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converges for * > o', and 

®' 4 > 

Proof: A trivial computation shows that for any finite non-decreasing solu¬ 
tion (/(<) of (1.3) and any T > 0 we have 

f e-'UUd) = [ T e~“dG(t) + f e~ ,x dF{x) e~“dU(t); 

Jo Jo Jo Jo 

herein all terms are non-negative and hence by (3.2) 

£ e^'dUd) < y(s) + *>(«) jfV*‘dtf(f). 

Now <p(s) < 1 for 8 > a', and hence it is seen that the integral (3.3) exists for 
8 > a' and satisfies (3.4). On the other hand it is well-known that the values of 
«($) for 8 > a' determine the corresponding function U(t) uniquely, except for 
an additive constant, at all points of continuity. However, from (1.3) and (3.1) 
it follows that f/(0) = 0 and, since by (1.3) U(t) is continuous to the right, the 
monotone solution U(t) of (1.3), if it exists, is determined uniquely. 

To prove the existence of U(t) consider a function w(s) defined for s > a' by 

(3.4) . It is clear from (3.2) that <p(s) and y(s) are completely monotone func¬ 
tions, that is to say that <p(s) and t(s) have, for s > cr, derivatives of all orders 
and that (— 1) V n) W > 0 and (—l)V n) («) > 0. We can therefore differentiate 

(3.4) any number of times, and it is seen that cd <n) (s) is continuous for 8 > a 
Now a simple inductive argument shows that (—l) n a> (n> ($) is a product of 
{1 — ^($)}~ <n+1) by a finite number of completely monotone functions. It 
follows that ( — l) n o> <n) (s) > 0, so that a >(s) is a completely monotone function, 
at least for s > <r\ Hence it follows from a well-known theorem of S. Bernstein 
and D. V. Widder* that there exists a non-decreasing function U ( t ) such that 
(3.3) holds for 8 > a'. Moreover, this function can obviously be so defined that 
U(0) *= 0 and that it is continuous to the right. Using U(t) let us form a new 
function 

( 3 . 5 ) V(0 - Jf‘ U(t-x)dF(x). 

V(t) is clearly non-negative and non-decreasing. It js readily verified (and, of 
course, well-known) that 

i(s) m [” e^dVd) - 
Jo 

It follows, therefore, from (3.4) that ^(s) — «(«) — y(s), and this implies, by the 


* This theorem has been repeatedly proved by several authors; for a recent proof cf. 
Feller [19]. 
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uniqueness theorem for Laplace transforms, that V(t) ** U(t) — G(t). Combin¬ 
ing this result with (3.5) it is seen that U{t) is a solution of (1.3). 

Theorem 2. Suppose that f(t) and g(t ) are measurable, non-negative and 
hounded in every finite interval 0 < t < T. Let the integrals 

(3.6) *(«) - jf e-‘/«) dt, y(s) - jf e~‘'g{t) dt 

converge for 8 > a. Then there exists one and only one non-negative solution u(t) 
of (LI) which is bounded in every finite interval 10 . With this function the integral 

(3.7) «(«) • £ e-'uifidt 

converges at least for s > a', where o' = a if lim ^(s) < 1, and otherwise a' > a 

is defined as the root of the characteristic equation <p(s) = 1. For 8 > a* equation 
(3.4) holds. 

If f(t) is continuous except , perhaps, at a finite number of points then u(t) — g(t) 
is continuous. 

Proof: Define F(t) and G(t) by (1.2). Under the present conditions these 
functions satisfy the conditions of Theorem 1, and hence (1.3) has a non-decreas¬ 
ing solution 17(f). Consider, then, an arbitrary interval 0 < t < T and suppose 
that in this interval f(t) < M and g(t) < M . If 0 < t < t + h < T we have 
by (1.3) 

0< J{C(t + fc)- £/(*)} 

i i r t+h 

rn i \G(t + h) - G(t)\ + r I U(t + h- x)f(x)dx 
h rl Jt 

+ H‘ lU(t + h-x) - U(t- x)\f(x)dx 

< M + MU(T) + M £ [U(t + h-x)~ Ult-x)}dx 

- M + MU(T) + j. U(y) dy-j- £ U(y ) dy 

< M + 2MU(T). 

Thus U(t) has bounded difference ratios and is therefore an integral. The 
derivative U’(t) exists for almost all t and 0 < U'(t) < M. Accordingly we can 
differentiate (1.3) formally, and since U(0) = 0 it follows that u(t) = U*{t) 
satisfies (1.1) for almost all t. However, changing u{t) on a set of measure zero 
does not affect the integral in (1.1), and since g{t) is defined for all t it is seen that 

10 Without the assumptions of positiveness and boundedness this theorem reduces to a 
special case of a theorem by Paley and Wiener [21]; cf. section 1, p. 243. 
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u(t) can be defined, in a unique way, so as to satisfy (1.1) and obtain (1.3). 
Since the solution of (1.3) was uniquely determined it follows that the solution 
u(t) is also unique. Obviously equations (3.7) and (3.3) define the same function 
u(s), so that (3.4) holds, and (3.7) converges for s > a'. 

Finally, if f(t) has only a finite number of jumps, the continuity of u(t) — g(t) 
becomes evident upon writing (1.1) in the form 

u(t) — g(l) = f «(*)/(< — x) dx. 

Jo 


4. Asymptotic properties. In this section we shall be concerned with the 
asymptotic behavior as t —► *> not of u(t) itself but of the mean value u*{t) — 
1 f 1 

- / «(r) dr. If u(t) tends to the (not necessarily finite) limit C, then obviously 
t Jo 

also u*(t) —+ C, whereas the converse is not necessarily true. For the proof of 
the theorem we shall need the following obvious but useful 
Lemma: If u(t) > 0 is a solution of (1.1) and if 

(4.1) «,(f) = e k ‘u(t), m - e kt f(t), Bl (t) = e k ‘g(l), 

then Ui(t) is a solution of 

ui(t) = gi(t) + [ ui(t - x)fi(x) dx . 

Jo 

Theorem 3 : Suppose that using the functions defined in Theorem 2 the integrals 

(4.2) 


are finite . 

(i) In order that 

(4.3) 


f f(t) dt = a, f g(t) dt = b f 
Jo Jo 

u*(t) = - f u(r ) dr —+C 
t Jo 


as t —> oo, where C is a positive constant , it is necessary and sufficient that a = 1, 
and that the moment } 


(4.4) 

be finite. In this case 

(4.5) 

(ii) If a < 1 m have 

(4.6) 


r trio d.= 

Jo 


C = 


m 




dt 


1 - a 
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(iii) If a > l let a' be the positive root of the characteristic equation <p(t) «■ 1 
(cf. (3.2)) and put 11 

(4.7) £ e-'‘tf(t)dt = m. 

Then 

(4.8) lim 7 f e~ r ' r u(r)dr = —. 

t-*Qo t Jq mi 

Remark: The case a = 1 corresponds in demography to a population of 
stationary size. In the theory of industrial replacement only the case a « 1 
occurs; the moment m is the average lifetime of an individual. The case a > 1 
corresponds in demography to a population in which the fertility is greater than 
the mortality. As is seen from (4.8), in this case the mean value of u(t) increases 
exponentially. It is of special interest to note that in a population with a < 1 
the integral (4.6) always converges. 

Proof: By (4.2) and (3.7) 

(4.9) lim <f>(8) = a, lim y(s) * 6 . 

*-*+0 *-*+o 

If a < 1, it follows from (3.4) that lim w(s) = 6/(1 — a) is finite. Since u(t) > 0 
this obviously implies that (4.6) holds. This proves (ii). 

If a = 1 and m is finite, it is readily seen that 


lim 
• -*+0 


1 — <p(s) 
8 


m, 


and hence by (3.4) 

lim 8u)(s) = lim 7 ( 5 ) lim --—— 7 - = ~. 

«-*+o t-M-o 1 — <p{8) m 

By a well-known Tauberian theorem for Laplace integrals of non-negative 

functions 12 it follows that u*(t) —► -. Conversely, if (4.3) holds it is readily 

m 

seen that 13 


11 (4.2) implies the finiteness of mi . 

11 Cf. e.g. Doetsch [18], p. 208 or 210. 

18 Indeed, if (4.3) holds and if U(l ) is defined by (1.2), then there is a Af — Af («) such that 
| U(t) - Ct \ < M + d. Now 

e-^UiDdt, 

sp(s) -C - s’ f e~“(U(t) - Ct ) dt, 

Jo 


«*■(«) ~ C\ < s’ I + *t) dt - sM + .. 


and hence 
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lim 8w(g) — C, 
•-*+0 


which in turn implies by (3.4) and (4.9) that 


lim 

»-*+o 


1 — »(«) 

8 


b 

C' 


This obviously means that the moment (4.4) exists and equals b/C. This 
proves (i). 

Finally case (iii) reduces immediately to (ii) using the above lemma with 
k = —o'. This finishes the proof. 

It may be remarked that the finiteness of the integrals (4.2) is by no means 
necessary for (4.3). This is shown by the following 

Example: Let 


m 


-Hit 


2 


g(t) “ ~ 7 -e' 


1 --1/4I 


It is readily seen that with these functions a 
e~V* and y(«) = e~^’/y/H, so that 


1, but b — oo. Now 14 ?>(«) 


«(s) 


e~V> 


s/a (1 — ) 


Thus sw(«) —►1 as 8 —*■ + 0, and hence u*(t) —+ 1. In this particular case it can 
even be shown that the solution u(t ) itself tends to 1 as t —* <*>. 

In practice, however, the integrals (4.2) will always exist, and accordingly we 
restrict the consideration to this case. 


5. Closer study of asymptotic properties. In this section we shall deal almost 
exclusively with the most important special case, namely where 

(5.1) £f(t)dt 1. 

The question has been much discussed whether in this case necessarily u(t) —* C 
as t —* », which statement, if true, would be a refinement of (4.3). Hadwiger 
[2] has constructed a rather complicated example to show that u(t) does not 
necessarily approach a limit. Now this can also be seen directly and without 
any computations. Indeed, if u(t) —*• C and if (5.1) holds, then obviously 

lim [ u(t — x)f(x) dx = C, 
t—K Jo 

and hence it follows from (1.1) that g(t) —* 0. In order that u(t) —* C it is therefore 


“The integrals can be evaluated by elementary methods, and are known; cf. e.g. 
Doetsch [18], p. 25. 
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necessary that git) —► 0, and this proves the assertion. In Hadwigar’s example 
lim sup g(t) **= » , which makes his computations unnecessary. 

It can be shown in a similar manner that not even the condition g(t) —► 0 is 
sufficient to ensure that u(t) —+ C. Some restriction as to the total variation of 
fit) seems both necessary and natural (conditions on the existence of derivatives 
are not sufficient). In the following theorem we shall prove the convergence of 
u(t) under a condition which is, though not strictly necessary, sufficiently wide 
to cover all cases of any possible practical interest. 

Theorem 4: Suppose that with the functions f(t) and g(t) of Theorem 2 

(5.2) J f(t) dt = 1, j£ g(t) dt = b < «. 

Suppose moreover that there exists an integer n ^ 2 such that the moments 

(5.3) wt* — f t k f(t) dt, fc = 1, 2, •••,«, 

Jo 

are finite, and that the functions f(t), tf(t), ?f(t), * • • , t n ~*f(t) are of bounded total 
variation over (0, »). Suppose finally that 


(5.4) 
Then 

(5.5) 
and 

(5.6) 


lim <’ 

<-♦00 


,n—2 


g(t) = 0 and 


lim ( 

<-*oo 


,n—2 


i; 


g(x) dx = 0. 


lim t 


lim u(t) — — 

<-*oo M\ 


- 1 }. 


0. 


Remark: As it was shown in section 4, the case where / f(t) dt > 1 

Jo 

can readily be reduced to the above theorem by applying the lemma of section 4 
with k = c\ where </ is the positive root of <p($) = 1: it is only necessary to 
suppose that c _<r < /(0 is of bounded total variation and that e~ 9 t g{t) — * 0. Ob¬ 
viously all moments of exist, so that the above theorem shows that 

Ui(t) = e“ #/, w(0 tends to the finite limit b'/m[ , where 


b f = J e~*'*g(t)dt , m'i =* J e^'Ufit) dt. 

b' 

Thus in this case and under the above assumptions u(t) ~ — te°’\ so that the 

M\ 

renewal function increases exponentially as could be expected. If however 

fit) dt < 1, 


l 
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u(t) will ini general not show an exponential character. If f(t) is of bounded 
variation and has a finite moment of second order, and if g(t) —* 0, then it can be 
shown that u(t) —* 0. However, the lemma of section 4 can be applied only if 
the integral defining *>(*) converges in some negative 8-interval containing a value 
s' such that ?(*') = 1, and this is in general not the case. 

Proof: The proof of Theorem 4 will be based on a Tauberian theorem due to 
Haar“. With some specializations and obvious changes this theorem can be 
formulated as follows. 

Suppose that 1(0 is, for t > 0, non-negative and continuous, and that the 
Laplace integral 

(5.7) X(s) = e~' 1 1(0 dt 

converges for s > 0. Consider X(«) as a function of the complex variables = 
x + iy and suppose that the following conditions are fulfilled: 

(i) For y ^ 0 the function X(s) (which is always regular for x > 0) has con¬ 
tinuous boundary values \(iy) as x —*■ +0, for x > 0 and y t* 0 


(5.8) X(s) = - + *(s), 

8 

where t(iy) has finite derivatives <P'(iy), ••• ,^ <r) (iy) and i ir) (iy) is bounded 
in every finite interval; 


(ii) 



e Uv \(x + iy) dy 


converges for some fixed x > 0 uniformly with respect to t > T > 0; 

(iii) X(x + iy) —* 0 as y —* ± », uniformly with respect to x > 0; 

(iv) \'(iy), X"(iy), • • • , X (r> (iy) tend to zero as y —* ± » ; 

(v) The integrals 

f e Uv \ {,) (iy) dy and f e ltv \ (r) (iy) dy 

J— oo Jyt 

(where yi < 0 and j/s > 0 are fixed) converge uniformly with respect to t > T > 0 . 
Under these conditions 


(5.9) lira t r {l(t) - C\ = 0. 

Now the hypotheses of this theorem are too restrictive to be applied to the 
solution u(t) of (1.1). We shall therefore replace (1.1) by the more special 
equation 

(6.10) v(t) = h(t) + j v(t — x)f(x) dx, 


>• Haar [20] or Doetaoh [18], p. 269. 



where 

(5.11) 
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h(t) «■ f(t — x)f{x) dx. 

Plainly Theorem 2 can be applied to (5.10). It is also plain that h(t) is bounded 
and non-negative and that (by (5.1)) 

(5.12) h(t)dt = 1, 

(5.13) x («) - e~ ,l h{t)dt « *> 2 (s). 


Accordingly we have by Theorem 2 

(5.14) r(s) = jf o e~"v(t) dt = - * 

We shall first verify that f(s) satisfies the conditions of Haar's theorem with 
r = n — 2. For this purpose we write 

(5.15) m = /,(o - /.«), 

where /i(<) and /*(<) are non-decreasing and non-negative functions which are, 
by assumption, bounded: 

(5.16) 0 < /,(<) < M, 0 < /,(<) < M. 


(a) We show that v(l) is continuous. Now by Theorem 2 the solution v(t) 
of (5.10) is certainly continuous if h(t) is continuous; however, that h(t) is con¬ 
tinuous follows directly from (5.11) and the fact that the functions 

J fi(t — x)f(x) dx and J £ fi(t — x)f(x) dx 

are continuous. 

(b) In view of (5.1) the function <p(s) exists for x = 9?(s) > 0. Obviously 
| <p{X + iy) | < 1 fori > 0. Now 


1 - <p(iy) = f (1 - e-'nmdt 

Jo 



cos yt)f(t) dt + i j sin yt-f(t) dt, 


and, since 1 — cos yt > 0 and f(t) > 0, the equality <p(iy) — 1 for y 9 * 0 would 
imply that /(<) = 0 except on a set of measure zero. It is therefore seen that 
<p(x + iy) 9 * 1 for all x > 0 and for x * 0, y & 0. 

It follows furthermore from (5.3) that for k = 1, • • • , « and x > 0 the deriva¬ 
tives 


* tt, (a) = jf (-t) k e-“f(t)dl 
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exist and that 


lim <p ik \x + iy ) «® <p w (iy). 
•-•+0 


Finally, it is readily seen that in the neighborhood of y — 0 we have 

v(w) - j[V'7tt) dt 


(5.17) « 1 — miiy + {iyf — + • • • 


+ (“D”' 1 ( -^y, to)" -1 + 0(| y I"). 


(c) From what was said under (b) it follows by (5.14) that £(«) is regular for 
x > 0, and that £(s), {"'(«), • • • , f <B, (s) approach continuous boundary values 
as « = x + iy approaches a point of the imaginary axis other than the origin. 
Now put 


(5.18) 


= 


»*(«) 

1 - <p(«) 


Wifi’ 


so that by (5.14) 

(5.19) f(s) = — + i{s). 

mis 


For x > 0 and x = 0 ; y 9 * 0 the function \p(x + iy) is obviously continuous; 
the derivatives ^(iy), • • • , ^ {n \iy) exist. To investigate the behavior of 
$(iy) in the neighborhood of y *= 0 put 


(5.20) P(y) - mi - ^(iy) +- 

By (5.17), (5.18) and (5.20) 

(6 . 21) m) - [IL^WL* - i] l + «l» in. 

Now the expression in brackets represents an analytic function of y which 
vanishes at y = 0. Hence >fr(iy) — %{y) + 0(| y | B-S ), where 'JJ(y) denotes a 
power series. It follows that the derivatives iy ), • • • , ^ <B- 2 > (ty) exist for all 
real y (including y — 0 ) and are bounded for sufficiently small \ y\: since they 
are continuous functions they are bounded in every finite interval. 

(d). Next we show that there exists a constant A > 0 such that for sufficiently 
large | y | 

(5.22) | o(x + iy) | < A 

ll/I 

uniformly in * > 0. By (5.15) 

(5.23) v(s) = {cos yt — i sin y<}e"*‘{/i«) — /i(t)| dt. 
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Now/i(<) is non-decreasing and accordingly by the second mean-value theorem 
we have for any T > 0 and y 

[ T cos yt-fi(t) dt - ft (D [ T cos ytdt - /,(T) sin Ty — 

Jo .» Jr y. . 

where r is some value between 0 and 7 (depending, of course, on y; at points of 

discontinuity, fi{T) should be replaced by lim Hence by (6.16) 

<—r-o 

| ^ cos yt • e“*‘ . 

Treating the other terms in (5.23) in a like manner, (5.22) follows. 

Combining ( 5 . 22 ) with (5.14) it is seen that for sufficiently large | y | 

uniformly in x > 0 . This shows that the assumptions (ii) and (iii) of Haar’s 
theorem are satisfied for X(s) = f(s). In order to prove that also conditions 
(iv) and (v) are satisfied it suffices to notice that the proof of (5.22) used only 
the fact that/(<) is of bounded total variation. Now v (k) (s) is the Laplace trans¬ 
form of (— 0 */( 0 » and, since <*/(<) is of bounded total variation for k < n — 2, 
it follows that 

le»l = 0 (|yr), 1 , 2 , ...,n- 2 , 

for sufficiently large | y |, uniformly in x > 0 . Differentiating (5.14) k times it is 
also seen that 

lr*«l - o( iirD, * - 1,2- 2, 

as y —* + oo, uniformly with respect to x > 0 . 

This enumeration shows that v(s ) = 1(1) and \(s) = f (s) satisfy all hypotheses 
of Haar’s theorem with r = n — 2 and C = 1/nti. Hence 

(5.24) Urn t k ~ 2 <v(t) - —\ = 0. 

( mil 

Returning now to (5.14) we get 

«(») = y(s) + ?(«)?(«) + y(«)f(«), 

or, by the uniqueness property of Laplace integrals, 

u(t) = g(t ) 4 - f g(x)f(t - x)dx + [ g(x)v(t - x)dx 

(5.25) J <> J « 

* g(t ) + ui(t) + ut(t) 

(which relation can also be checked directly using (5.10)). Let us begin with 
the last term. We have by (5.2) 
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and hence 


«i(0 -£r m f g(t- *){»(*) - 

mi Jo I, mi) 


«.(<)-£ < 2 B_1 f g(i - x)a v(x) - ± 

mi Jt/t mi 


dx 


+ t n 


/ g (it) 

Jtn 


mi 

v(t — 


mi 


dy. 


If t is sufficiently large we have by (5.24) in the first integral x n v(x) — < e. 

mi 

In the second integral v{t — y) —— is bounded, and hence by (5.4) 

7Tl\ 


lim t n ut(t) —— 0 . 

<-►00 mi 

The same argument applies (even with some simplifications) also to the second 
term in (5.24); it follows that 

lim f B-l Ui(f) = 0 , 

whilst t n ~ 2 g(t) —* 0 by assumption (5.4). Now the assertion (5.6) of our theorem 
follows in view of (5.25) if the last three relationships are added. This finishes 
the proof of Theorem 4. 

It seems that the solution u(t) is generally supposed to oscillate around its 
limit b/m\ as t — * <» . It goes without saying that such a behavior is a priori 
more likely than a monotone character. It should, however, be noticed that 
there is no reason whatsoever to suppose that u(t) always oscillates around its 
limit. Again no computation is necessary to see this, as shown by the following 
Example: Differentiating (1.1) formally we get 

u'{t) = g'(t) + g(fl)f(t) + f u'(t - x)f(x ) dx, 

Jo 

which shows that, if g(t) and fit) are sufficiently regular, u'(t) satisfies an integral 
equation of the same type as u(t). Thus if 

9 'it) + g(0)m > 0 

for all t, we shall have u'{t) > 0 , and u(t) is a monotone function. In particular, 
if ff'(0 + g(0)f(t) — 0, then u'(t) = 0 and u(t) — const. For example, let/(<) = 
g(t) — e~‘. Then <p(s) = y(s) — l/(s + 1 ) and hence w(«) = 1 /«, which is the 
Laplace transform of u(t) = 1 . It is also seen directly that u(t) a 1 is the 
solution. We have however the following 
Theorem 5 w : If the functions /(f) and g(t) of Theorem 4 vanish identically for 
t > T > 0, then the solution u(t) of (1.1) oscillates around its limit b/mast—* ». 

'* Under some slight additional hypotheses and with quite different methods this theorem 
was proved by Richter [16]. 
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Proof: For * > T equation ( 1 . 1 ) reduces to 

M(0 “ f u(t — x)f(x) dx, 

Jt-T 

and since f f(x) dx = 1 it follows that the maxima of u(t) in the intervals nT < 

t < (n + 1 )T form, for sufficiently large integers n, a non-increasing sequence. 
Similarly the corresponding minima do not decrease. Since u{t) —*■ b/tn i, by 
Theorem 4, it follows that the minima do not exceed b/titi and the maxima are 
not smaller than b/mi . 

6. On Lotka’s method. Probably the most widely used method for treating 
equation ( 1 . 1 ) in connection with problems of the renewal theory is Lotka’s 
method. As a matter of fact this method consists of two independent parts. 
The first step aims at obtaining the exact solution of ( 1 . 1 ) in the form of a series 
of exponential terms (this is achieved by an adaptation of a method which was 
used by P. Hcrz and Herglotz for other purposes. The second part of Lotka’s 
theory consists of devices for a convenient approximative computation of the 
first few terms of the series. While restricting ourselves formally to Lotka’s 
theory, it will be seen that some of the following remarks apply equally to other 
methods. 

Lotka’s method rests essentially on the fundamental assumption that the 
characteristic equation 

( 6 . 1 ) *(«) = 1 

has infinitely many distinct simple 17 roots So , , • • • , and that the solution u{t) 

of ( 1 . 1 ) can be expanded into a series 

(6.2) u(t) - £ A k e ,k ‘ 

k 

where the Ak are complex constants. The argument usually rests on an assumed 
completeness-property of the roots. Thus, starting from (2.4) it is required that 

( 6 . 2 ) reduces to h(t ) for t < 0 ; in other words, that an arbitrarily prescribed 
function h(x) be, for x < 0 , respresentable in the form 

(6.3) h(x) = £ A k e ,tx (x < 0). 

k 

In practice we are, of course, usually not concerned with h(t) but with g(t) (cf. 
(2.5)), and according to Lotka’s theory the coefficients A k of the solution ( 6 . 2 ) 
of ( 1 . 1 ) can be computed directly from g(t) in a way similar to the computation 
of the Fourier coefficients. 

Lotka’s method is known to lead to correct results in many cases and also to 

17 Hadwiger [3] objected to the assumption that all roots of (6.1) be simple. The modifi¬ 
cations which are necessary to cover the case of multiple roots also will be indicated below. 
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have distinct computational merits. On the other hand it seems to require a. 
safer justification, since its fundamental assumptions are rarely realized. Thus 
clearly an arbitrary function h(x) cannot be represented in the form (6.3): to 
see this it suffices to note that (6.1) frequently has only a finite number of roots 
(cf. also below). It should also be noted that, the series (6.3) having regularity 
properties as are assumed in Lotka's theory, any function representable in the 
form (6.3) is necessarily a solution of the integral equation (2.4), whereas the 
theory requires us to construct a solution u(t) which reduces to an arbitrarily 
prescribed function h(t) for t < 0, (which frequently is an empirical function, 
determined by observations). Nevertheless, it is possible to give sound founda¬ 
tions to Lotka’s method so that it can be used (with some essential limitations 
and modifications) sometimes even in cases for which it originally was not 
intended. For this purpose it turns out to be necessary that all considerations 
be based on the more general equation (1.1), instead of (2.4) (cf. also section 2). 

Before proceeding it is necessary to make clear what is really meant by a root of 
(6.1). The function <p(s) is defined by (3.2), and the integral will in general 
converge only for s-values situated in the half-plane 3? (s) > a. Usually only 
roots situated in this half-plane are considered 18 . It is also argued that (p(s ) 
is, for real 8, a monotone function, so that (6.1) has at most one real root: ac¬ 
cordingly the terms of (6.2) are called “oscillatory components.” However, 
the function <p(s) can usually be defined by analytic continuation even outside 
the half-plane 9?($) > <r, and, if this is done, (6.1) will in general also have roots 
in the half-plane 9t(«) < <r. It will be seen in the sequel that these roots play 
exactly the same role for the solution u(t) as the other ones, and that the ap¬ 
plicability of Lotka’s method depends on the behavior of <p(s) in the entire 
complex a-plane. It may be of interest to quote an example where (6.1) has 
infinitely many real and no other roots. 

Example 19 : Let 

<«.4) fit) - t > 0; 


18 This was stated in particular by Hadwiger [3] and Hadwiger and Ruchti [6]; accord¬ 
ingly the results of the latter paper (obtained by methods quite different from Lotka’s) 
need some modifications. 

18 Cf. the example at the end of section 4. A function closely' related to (6.4) 
plays an important role in two recent papers by Hadwiger [4] and [5]. Hadwigcr's conclu¬ 
sion, if it could be justified, would fundamentally change the aspect of the whole theory. 
The conclusion reached by Hadwiger seems to be that for any biological population the 
reproduction function should be of the form u(t) * 2u«(0> where u»(t) represents the 
contribution of the nth generation and 

M Unit) - 

Here a, A and C are constants. Clearly (*) is a generalisation of (6.4). Now his conclusion 
is based on the arbitrary assumption that u*(i) should be of the form u*(0 - ^(*, na), 
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It is easily seen thatp(«) « e - vT The integral (3.2) converges only for fR(«) > 
0, but <p(s) is defined as a two-valued function in the entire s-plane. The roots of 

(6.1) are obviously «* = —4 &V, so that all of them are real and wimple. If 
0(0 * /(0, we get by (3.4) 

«(*) - - —js « zh , « real, > 0. 

1 - e v* t 

Now e-*v / * is the Laplace transform of ^ y - 2 e“ B,/4 ‘, and hence it is readily 

seen that the solution u(t) can be written in the form 

(6JS) 

of course, this expansion is not of form (6.2) and shows no oscillatory character. 

From now on we shall consistently denote by <o(s) the function defined by the 
integral (3.4) and by the usual process of analytic continuation; accordingly we 
shall take into consideration all roots of (6.1). The main limitation of Lotka’s 
theory can then be formulated in the following way: Lotka’s method depends 
only on the function g(t) and on the roots of (6.1). Now two different functions 
/(<) can lead to characteristic equations having the same roots. Lotka’s method 
would be applicable to both only if the corresponding two integral equations 

(1.1) had the same solution u(t). This, however, is not necessarily the case. 
Thus, if Lotka’s method is applied, and if all computations are correctly per¬ 
formed, and if the resulting series for u(t) converges uniformly, there is no 
possibility of telling which equation is really satisfied by the resulting u(t): 
it can happen that one has unwittingly solved some unknown equation of type 

(1.1) which, by chance, leads to a characteristic equation having the same roots 
as the characteristic equation of the integral equation with which one was really 
concerned. Indeed this happens in the following example which is familiar in 
connection with our problem. It is illustrative also for other purposes: thus it 
shows not only limitations of Lotka’s method, but also that this method can be 
modified so as to become applicable in some cases where the characteristic equa¬ 
tion has only a finite number of roots. 


where ^(z, a) is independent of n. To my mind Hadwiger’s result shows only the im- 
practibility of this axiom. However, Hadwiger’s result is not correot even under his assump¬ 
tion. Indeed, he derives for 4>{x, a) the functional equation 

(•*) *(*> a + 6) - f am, b) d(, 

which is well-known from the theory of stochastic processes. Now Hadwiger merely 
verifies the known result that (*) leads to a solution of (**). However, (**) has infinitely 
many other solutions (it is possible to write down expressions for their Laplace transforms, 
although it is difficult to express the solutions themselves explicitly). This, of course, 
renders Hadwiger’s result illusory. 
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Example: Pearson type Ill-curves. ,0 Consider the integral equation (1.1) 
in the following two cases: 

(I) M = g{t) = m = ~t w e-' 
and 

(II) fit) = git) = fuit) = *<V‘. 

It is readily seen (and well known) that the corresponding Laplace transforms are 


(I) 

¥>i(s) 

1 

is + 1)»' ! 

and 



(II) 

Wi(«) 

1 

is + 1)*’ 


respectively. Thus in both cases the characteristic equation has the same roots 
namely 


t 


«i ~ 0, 



of which only the first one lies in the half-plane of convergence of the integral 
(3.4). Lotka’s method is not applicable since there are only three roots. How¬ 
ever, in the second case, an expansion of type (6.2) is possible. Indeed, we have 
by (3.4) 


«nW = 


<pii( 8 ) 

1 — <puis) 


s» + 3s 2 + 3s 

1 _ i _ 
1.6 2y/% 



6 + 2\/3 

« + | + |V3 


now 1 /(* + a) is the Laplace transform of e a \ and hence we obtain the solution 
u(t) in the form 



General Pearson curves have been investigated recently in connection with (1.1) by 
Brow t n [11, Hadwiger and Ruchti [6] and Rhodes [15]. Hadwiger and Ruchti use a method 
of their own, but they are also led to the study of the characteristic equation (6.1) in a 
slightly disguised form: their result needs a modification since they arbitrarily drop the 
roots lying in the halfplane of divergence of the integral ?(«). 
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which is an expansion of type (6.2). In the first of the above examples we get 
for real positive « 


«i(*) 


_ »i(*) _ v 


1 - W (a) «-l (8 + l)*"/*' 
and it is readily seen that this is the Laplace transform of the solution 

1 


«i (0 




tfx r(3n/2) 


8(n-2)/J 


The series is convergent for t > 0, but obviously this solution cannot be repre¬ 
sented in a form similar to (3.2). 

A similar remark applies to the general Pearson-type III curve 

f(t) = Ate~ a> , 


where A, a, (l are positive constants; the corresponding Laplace transform is 


*>(«) - Ar(/J + 1) 


1 

(• + «)W 


These preparatory remarks enable us to formulate rigorous conditions for the 
existence of an expansion of type (6.2). The following theorem shows the limits 
of Lotka’s method, but at the same time it also represents an extension of it. 
In the formulation of the theorem we have considered only the case of absolute 
convergence of (6.2). This was done to avoid complications lacking any practi¬ 
cal significance whatsoever. The conditions can, of course, be relaxed along 
customary lines. 

Theorem 6: In order that the solution u(t) of Theorem 2 be representable in 
form (6.2), where the series converges absolutely for t > 0 and where the s k denote the 
roots of the characteristic equation 21 (6.1), it is necessary and sufficient that the La¬ 
place transform «($) admit an expansion 


( 6 . 6 ) 


w(s) m 


y(s) 


-E 


A k 


1 “¥>(«) ” 8 - 8 * 
and that 2j A* | converges absolutely. The coefficients Ak are determined by 


(6.7) 


Ak 


y'isk) 
<pM ' 


In particular, it is necessary that u(s) be a one-valued function™ 

Proof: All roots 8* of (6.1) satisfy the inequality SR(a*) < o ', where o ' was 
defined in Theorem 2. It is therefore readily seen that in case 2 | A* | con¬ 
verges, the Laplace transform of (6.2) can be computed for sufficiently large 


81 The number of roots may be finite or infinite. It should also be noted that it is not 
required that If the s* have a point of accumulation, u(s) will have an essential 

singularity. That this actually can happen can be shown by examples. 
n This was not so in our example I. 
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positive e-values by termwise integration so that ( 6 . 6 ) certainly holds for suffi¬ 
ciently large positive e. Now with 2 | Ak | converging, ( 6 . 6 ) defines «(«) 
uniquely for all complex « (with singularities at the points s* and the points of 
accumulation of s*, if any). Since the analytic continuation is unique, it follows 
that ( 6 . 6 ) holds for all s. The series 2 | A k | must, of course, converge if ( 6 . 2 ) 
is to converge absolutely for t — 0 , and this proves the necessity of our condi¬ 


tion. Conversely, if «(s) = 


y(®) 


is given by ( 6 . 6 ), and if 2 | Ak | con- 


1 - <p(«) 

verges, then u(s) is the Laplace transform of a function u(t) defined by (6.2). 
Since the Laplace transform is unique, u{t) is the solution of ( 1 . 1 ) by Theorem 2 . 
The series ( 6 . 2 ) converges absolutely for t > 0 since | | < | Ak |e' *. 

Finally (6.7) follows directly from ( 6 . 6 ). 

It is interesting to compare (6.7) with formulas (50) and (56) of Lotka’s 
paper [8]. Lotka considers the special case git) = /(<); in this case y («*) = 

ifiisic) = 1 , and (6.7) reduces to Ak — — - 4 — r . If Sk lies in the domain of con- 

V («*) 
f°° 

vergence of the integral <p(s) — / e~‘‘f(t) dt, that is, if 5R(«t) > <r then 

Jo 

( 6 . 8 ) i, = r e -“ ifn) dt, 

Ak Jo 

' in accordance with Lotka’s result. However, ( 6 . 8 ) becomes meaningless for the 
roots with 5W(s*) < <r, whereas (6.7) is applicable in all cases. 

Theorem 6 can easily be generalized to the case where the characteristic equa¬ 
tion has multiple roots. The expansion ( 6 . 6 ) (which reduces to the customary 
expansion into partial fractions whenever u(s) is meromorphic) is to be re¬ 
placed by 


(6.9) 




= X (-■ 


Al l > 


+ 


Al’> 


I (»»*) 


* v“ 8l b) 2 

where m* is the multiplicity of the root s* 
expansion 

I id) . ^ t 


+ ... + 


(« - s*)“*J 

This leads us formally to an 


( 6 . 10 ) 


u(t) = 53 e " 
* 




1 ! 


+ Al nk) 


(m t - 1 )!J' 


which now replaces ( 6 . 2 ). Generalizing Theorem 6 it is easy to formulate some 
simple conditions under which ( 6 . 11 ) will really represent a solution of ( 1 . 1 ). 
Other conditions which ensure that (6.9) is the transform of (6.10) are known 
from the general theory of Laplace transforms; such conditions usually use only 
function-theoretical properties of (6.9) and are applicable in particular when 
»(«) is meromorphic. We mention in particular a theorem of Churchill [17] 
which can be used for our purposes. 


7. On the practical computation of the solution. There are at hand two main 
methods for the practical computation of the solution of (1.1). One of them 
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has been developed by Lotka and consists of an approximate computation of a 
few coefficients in the series (6.2). The other method uses an expansion if: 

(7.1) tt(f) = H U n (t), 

n«0 

where w„(f) represents the contribution of the nth “generation” and is debited 
by x 

(7.2) Uo(l) — g(t), tt»+l(<) = jf Unit - x)f(x) dx. 

Now the Laplace transform of u n +i(t) is y(8)<p n (8), and hence (7.2) corresponds 
to the expansion 

(7.3) «(«) - = y<«) £ M). 

1 — <p{8) n-0 

In practice the functions g(t) and f(i) are usually not known exactly. Fre¬ 
quently their values are obtained from some statistical material, so that only 
their integrals over some time units, e.g. years, are actually known or, in other 
words, only the values 

i i»(n-f*l)4 -j e(n+l)( 

(7.4) /- * jj mt fa) dt, gn = g(t) di, 

are given, where 5 > 0 is a given constant. Ordinarily in such cases some 
theoretical forms (e.g. Pearson curves) are fitted to the empirical data and 
equation (1.1) is solved with these theoretical functions. Now Buch a procedure 
is sometimes not only very troublesome, but also somewhat arbitrary. Con¬ 
sider for example the limit of u(t) as t — *■ *>; this asymptotic value is the main 
point of interest of the theory and all practical computations. However, as has 
been shown above, this limit depends only on the moments of the first two 
orders of /(<) and g(t), and, unless the fitting is done by the method of moments, 
the resulting value will depend on the special procedure of fitting. Accordingly 
it will sometimes happen that it is of advantage to use the empirical material 
as it is, and this can, at least in principle, always be done. 

If only the values (7.4) are used it is natural to consider f(t) and g(t) as step- 
functions defined by 

(7.5) ^ f*’ I for nS < t < (n + 1)5. 

git) = g», J 

In practice only a finite number among the /» and g n will be different from zero: 
accordingly the Laplace transforms y(s) and <p(s) reduce to trigonometrical poly- 

nomials, so that the analytic study of «(s) = -—becomes particularly 

1 — <p(8) 

simple. Lotka’s method can be applied directly in this case. 
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For a convenient computation of (7.1) it is better to return to the more general 
equation (1.3), instead of (1.1). The summatory functions F(t) and (?(<) should 
not be defined by (1.2) in this case, but simply by 

(7.6) F(t) = iJ/„, 

n»0 n«0 

It is readily seen that the solution U(t) of (1.3) can be written in the form 
U(t) — 23 Unit), where 

n»0 



Uoit) - Git), U n-r'(0 = [‘ Unit - x)dFix ); 

Jo 

in our case Unit) will again be a step-function with jumps at the points kS, the 
corresponding saltus being 

* 

Wo ** Qh y W n +l — / ^ Jr • 

r-0 

Thus we arrive at exactly the same result as would have been obtained if the 
integrals (7.2) had been computed, starting from (7.4), by the ordinary methods 
for numerical integration of tabulated functions. It is of interest to note that 
this method of approximate evaluation of the integrals (7.2) leads to the exact 
values of the renewal function of a population where all changes occur in a dis¬ 
continuous way at the end of time intervals of length S in such a way that each 
change equals the mean value of the changes of the given population over the 
corresponding time interval. 
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ON THE JOINT DISTRIBUTION OF THE MEDIANS IN SAMPLES 
FROM A MULTIVARIATE POPULATION 

By A. M. Mood 
University of Texas 

It in well known [1] that in the case of a population having a single variate 
distributed according to a density function satisfying certain general conditions, 
the median of a sample is asymptotically normally distributed about the popula¬ 
tion median as a mean. It is the purpose of this paper to extend this result to 
populations involving more than one variate. Besides the theoretical interest 
of such a result, there may be some practical value in it when one is dealing with 
samples from a population for which the median is a more efficient statistic than 
the mean, as, for example, when the population variance is not finite. 

The complexity of the exact distribution of the sample median increases 
rapidly with the number of variates which describe the population; it is almost 
impossible to write out completely the distribution for the general case of k 
variates. For this reason the author has chosen to give first a detailed presenta¬ 
tion for the case of two variates, then use a condensed notation to establish the 
general result. This is a circuitous route, but it seems to be the only feasible one. 
A condensed notation is necessary for the general case, but presented alone it 
would be well-nigh incomprehensible. 


1. Distribution of the median in two dimensions. An extension of A. T. 
Craig's [2] geometrical argument will be used to obtain the exact distribution of 
tiie sample median. Let us consider two variates Xi and with density function 
f(x i, Xi) which shall satisfy the following conditions: 

1. f(x i, Xi) > 0 

LAr x, ) dz ' ■ L m * ,+0 (s) 

3. t f f{x i, Xt) dxi dx t — 1 


4. Each of the equations 

f f f(xi, xj) dx 2 dxi — £ 

J— 00 V—00 

/ f f(x 1, a*) dxidxt = J 

oo «l—oo 

has a unique real root. 
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If (i and {i are the respective roots of the two equations of this last condition 
then the point (ft, fj) is defined to be the population median. It will be assumed 
in what follows that the coordinate system has been so chosen that ft » 0 . 

Let a sample of 2» + 1 elements (xu , x»„)(a = 1,2, • • • , 2n + 1) be drawn 
from this population. The sample median (ii, x s ) will be defined as an element 
(not necessarily in the sample) whose Xi coordinate is the middle, with respect 
to magnitude, number of the set of numbers x ia , and whose x% coordinate is the 
middle number of the set of numbers xj tt . Now let us compute the probability 
that the sample median will lie in the rectangle 

2i — $ dii < Xi < ii + § d$i i = 1 , 2 . 

This rectangle will be denoted by R". The remainder of the plane will be divided 
into eight other regions R\ , • • • , Rt as indicated by the dotted lines in figure 1. 
The probability that an element will fall in the region R™ will be denoted by 

Pt }) = / / /(*i, xt) dxidxt 

J J R\f> 



_ 




*3 

. - 

H" 

/?: 



, mmm 

/? ¥ 



X, 



Fig. 1 


Neglecting terms involving differentials of higher order we have 

J - O0 ft 90 

I f(xi, x t ) dxidxi 
*1 ^*1 

Pi = f f. f(x 1, Xt) dx 2 dx 1 
*L.eo 


( 1 ) 


p' - / f(xi, it) dxidit 


p” St) d&idit- 
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We shall consider now that the sample is drawn from a multinomial population 
with probabilities pi , • • • , p" and pick out those terms which give rise to a 
sample median in R If the median is an element of the sample, then that 
element must fall in R" and the other elements must fall in the regions R x , 
Ri , R$ , and Ra in such a manner that 

ni + n 2 = ns + ri4 ^ n 

ni + n4 = n2 + n 3 = n 

or so that 

(2) rii = n 3 and n 2 = ra 


where n* is the number of elements in Ri . The probability that this occurs is 


( 3 ) 


z 

ni + nj—n 


(2 n +_1)J 

rii ! 2 7h ! 2 


p"Pi wi P2 n, p 3 ni pr 


Now suppose the median is determined by two different elements of the sample, 
for example one in R[ and one in R r 2 , then there must be fti elements in R \, 
rii + 1 elements in R *, and n 2 elements in each of R 2 and Ra with 


( 4 ) 


ni + n 2 = 7 i — 1 . 


The probability in this case is 


( 5 ) 


, »( w+ p ‘ p ‘ • 


Continuing in this manner we obtain the distribution of the median, and letting 
D {fi t it) represent the density function giving this distribution we have 

D(ii, it) ditdit = p" 2 M ni (ptp<) n * 


( 6 ) 


+ (ptPiPt + PiPsPt) 2 
+ ( PtPip'i + PiPtPt) 2 


( 2 n + 1)1 

?ii! (ni + l )!^! 2 

( 2 n + 1)1 

ni! 2 na! (%+!)! 


(PiPt) ni (piPi) n * 
(PlP*) ni (p*P4) B ‘. 


2. Asymptotic distribution of the median in two dimensions. As a simple 
notation 

A= B( 1 + 0 ( 1 /Vn)) 

will be abbreviated to read 

(7) A =• B, 

the dot after the equality sign indicating the omission of the factor 1 + 0(1/y/n). 
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As is customary, the second term of this factor represents any function such that 

lim NOQ./N) - L < «. 

tf—m 

In order to get an approximation to (6) for large n we shall use the normal 
approximation for the multinomial distribution and compute the sums (these 
cannot be put in finite form) by integration. We use then the well-known result 

(8) m! fl VV - • [A/(2ir) r ~ 1 ] i exp (- § £ A^z) fl dz<, 

n«ii 1 v 1 Ji 

i 

where 

(9) Zi = (rru - mpi)/\/m , i = 1, 2, ..., r - 1, 

(10) A., = 1 + 1 , Aij = 1 . 

Pi Pr Pr 

Returning to (6) it is to be noted that the fraction immediately following 2 
in the first sum has one more factor in the denominator than the corresponding 
fractions in the other sums. This first sum may therefore be neglected in the 
asymptotic form as it is of order 1/n in comparison with the others. We con¬ 
sider now the second sum in (6) and let it be represented by the letter S 


(2tt l) m n 2 

Pi P* Pi P4 . 


(11) S = 2n(2n + l)p(p 2 £ „ , /„ j_im« «• 

m+nj-n-i nil (Wi+1)! Tier 

Employing (8) and omitting certain terms of order 1/n we have 

(12) S = • 4 n 2 p[p'i 2 |4/(2tt) 3 ]* exp } X A.,, 2 , 2 ,^ ctei dzt dr a , 
in which the A <, are defined by (10) with r = 4, and 

(13) Zi = (ru - 2npi)/y/2n , i- 1,2, 3. 

In view of the relations (2) between the n< we have 

Z 2 = V2n (h - Pi - Pi) - «i - ui - 2l 

(14) _ 

2» = V2n (pi — pt) — zi = u% — Zi, 


in which relations we have defined the new symbols Ui and «s. It will be recalled 
that in (8) the factors dz; correspond to factors 1 /y/m, we therefore let dz» and 
dzi in (12) cancel a factor 2 n from the coefficient of the exponential, and after 
substituting (14) in (12) find that 


( 15 ) 


5 = - 2np',pi 2 [A/(2t)*|* exp j— + ^ + ^ + 


\ P* Pi Pi) 


+ 


(ui + tfr)* 
P* 


+ 
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The summation can now be performed to within terms of order l/y/n by inte¬ 
gration with respect to between the limits — « and + »; this gives us 


( 16 ) 




+-+-Y 

Pi Pa 


■{-»[ 


(ui + Utf 
V « 


+ ^ _(*± 2!!_!5 + »Y/(A +1+1 + 1)]}. 

Pi \ Pi Jh Pi/ / \Pi Pi Pi Pi/ JJ 


At this point some new symbols are required. We let Qi and q[ represent the 
results of replacing X\ and by zero in the integrals of the relations (1) 



Vl ~ l l ^ Xl ’ x ^ dxidXs 

q'i = f f(xi, 0) dan 

Jo 


Qs = f f f(x i, x„) dxi dx t 

q» = [ /( 0, *t) d®* 

(17) 

•l—oo »0 

r° r° 

r° 


q* = / / f(xi,x t ) dhdxt 

00 00 

?«=°) 


q* - f 0 j[ f(xi,xt)dxidx t 

/*« 

gi = **) <&* 

then 



(18) 

qi + ffa = g» + q< = qi + q* - q* + q* = i 

and 



(19) 

qi = q», 

q* = q*. 

Also we let 


(20) 

ai = q't + q», 

a* =* & + q't, 

(21) 

Vl = \/2 n aiii, 

yt = \/2n o*fe . 

We have now 



p< =• qt, 

* “ 1» 2, 3, 4, 

(22) 

II 

►Q 

f 

* “ 1 » 3 , 


Pt = • q'id£i, 

t - 2, 4. 

Also 
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Similarly. 
(24) 


- V2 nit <**ii 

/*• y/2n it f f(x i, 0) dxx 


W y_ 

\/2n a? $2 
b • y*- 

Ut «= \/2n (Pi - p») 
• “ • —(Vi + !/*)• 


1, 


The result of substituting (22), (23) and (24) in (16) with some further simplifica¬ 
tion using (18) and (19) is 


(25) 


O 2 nq[q't 

S = • —-~= exp 
2iry #2 


H 


y\ - 4(gx - g»)yiy« + y\ 
4gxgj 


dix d% i. 


The other three sums of (6) will give rise to the same expression except that the 
factors qlqt will be different; it is clear then that 

2 ’ i ^gL+M l + j. ?i + gjgi) 

2r J vqiqi 


X exp (-1 + 

V 2 4gi q 2 ) 


(26) 

(27) 


2naia 3 


- 4(gi - qjyiyt + yl ^ 
iqiQi 

Qifi — 4(gx — gg)aia»iift + a|fl^ 


l ° J ( „ a?— 4(gx — g*)axajfxf* + af $1\ , , 

-t= exp I — n ---I axi dx %, 

2ry/q\qi \ fyiq* / 


2tV ?x 92 


exp 


This is the asymptotic form for the distributiorrof the median in two dimensions. 


3. Distribution of the median in k dimensions. We consider now a population 
characterized by a density function f(x x, • • • , a:*) defined over a euclidean space 
of k dimensions satisfying conditions like those required of f{x\, x») in section 1, 
and we assume that the population median is at the origin so that the integral 
of the density function over any half-space determined by a coordinate hyper¬ 
plane is 

A sample of 2n + 1 elements will have a median (f i, • • • , £*) each coordinate 
of which is the middle number of the set of numbers giving the corresponding 
coordinate of the elements of the sample. To obtain the probability that the 
sample median lies in the hyperparallopiped f 0 — < x„ < Z a + i di a 

(a = 1, 2, • • • , k), we divide the space into 3* regions by means of hyperplanes 
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perpendicular to the coordinate axes through the points i a ± i on the co¬ 
ordinate axes. These regions are illustrated in Figure 2 for the case of three 
dimensions. The coordinate axes have been omitted in this figure. There 
will be 2* primary regions denoted by Ri, Rt, • • • , Rt* corresponding to the 
octants of the figure; fc2* -1 regions with one differential dimension denoted by 

R'i , Rt , • ■ •, corresponding to the quarter slabs of the figure; 2*- 1 

regions with two differential dimensions corresponding to the half strips of the 
figure, and so forth. Probabilities associated with these regions are defined by 

Vi 1 ' = J ln f&i, • • *, x k ) dzi • • • dx k . 



If the sample median is determined by k different elements of the sample there 
will be one of these k elements in each of k regions Ri whose differential dimen¬ 
sions are mutually orthogonal and the other elements of the sample will fall in 
the regions Ri in such a way that n elements of the sample will lie on either 
side of any of the k hyperplanes x a = x a . The probability of this occurrence 
for a particular choice of k of the regions Ri is 


(28) 




9 * 


«■»! 


^L+ii 1 n pr 

11 Hi I 


in which the 2* indices n< are subject to k independent restrictions of the type 


(29) 


2'n< == n — , 
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where c„ is an integer such that 0 < c« < k, and the prime on 2 indicates that 
the sum is to be taken over all to, on one side of a hyperplane x a *■ *«. »< is 
the number of elements in Rt and besides the k conditions (29) we have also 

t* 

(30) £ " 2n — k + 1. 

i 

In order to include all ways in which the median is determined by k different 
elements of the sample we must add together 2 W *~ 1> sums of the type (28). If 
the median is determined by less than k elements, say k — h elements, then the 
fraction (2n + l)!/IIn<! will have h extra factors in the denominator and henCe 
the sum will be of order I/to* as compared with that of (28) and may be neglected 
in obtaining an asymptotic expression. 

Thus we need only find the limiting form of (28) 

S - (2n + 1)(2») ... (2» — k + 2) f[ Pi £ — w* V~~ ft P?‘> 

i 11 n<! 1 

which after substituting (8) and neglecting terms of lower order becomes 

(31) 8 = ■ (2to)* n Pi. £ (A/( 2r) 2 * -1 )* exp (-J £ A tj uZi) II dzi, 

1 

in which the An are defined by (10) with r =* 2* and 

(32) Zi = (to, — 2npi)/\/2n, i = 1, 2, • • • , 2* — 1. 
Now we define 

(33) u a = y/2 ro( j — S %), a ® 1, 2, • • •, k, 

the S' having the same significance as in (29). These conditions (29) may now 
be put in the form 


Z. — •'Ua L.{z), 

in which L.(z) is a sum of a certain subset of the variables Zk+i , • • • , **»_ i . 
Care must be taken in labeling the regions Ri in order to be able to solve for 
Zi , • • • , it in this form. After substituting these relations in (31) we replace 

b _ 

IX dz a by (l/2n)* /s and perform the summation to within terms of order 1/V» 

i 

by integrating the remaining 2 , from — ® to + »; the result is 

(34) 8 - ■ (2n/2r) kli II Pt. VB exp J £ u„u^j, 

in which the B& are functions of the p<, and B = | B& |. As in (17) and (20) 
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we define 

q* = / /(* 1, ■■■ ,Xk)ndx a 
(35) q't = f f(zi, • • •, x*)n'dx. 

J Ri 

Q a = [ f(x l) • • • f %k) IT dXg 88 2 j 

0 

in which is the set of regions bounded by the coordinate hyperplanes St¬ 
are regions into which the coordinate hyperplanes are divided by the remaining 
coordinate hyperplanes. II' indicates that one of the differentials is omitted 
and the variate corresponding to that differential is put equal to zero in 
f(x i 2' indicates the sum over all q' determined by regions lying in 

the hyperplane x a = 0. It is clear that 


(36) 


where 


pi=-q< 

« a 

k 

Wa ~ • y/\ 2?l 23 8 a 0 = 23 Vfi f 

0-1 


Safi = zkl or 0, and yp = \/2nap3tp . 

Making these substitutions in (34) we have 

(37) S - • (2n/2v) kl1 U q' ia VCexp(-n£ C a0 a a a 0 x a x^ IE dl a , 

and adding together all possible sums of the type (28) we have the asymptotic 
form of the distribution of the sample median 

(38) D(ii, •••,**) lid*. 

k / k 

a a \/C exp ( — n 53 

(39) = • (l/2*) k,l VC exp (- J 5Z C^aVe) II dy a , 
in which the C a0 are functions of the qi. 


= .(2n/2v)* ,s I[ 

i 


Caffhflt&Jtf J H dia 


4 . The case at three dimensions. The computation of the coefficients 
of (39) requires the evaluation of a determinant of order 2 k — k for each one of 
them. This work was quite laborious even for k = 3 and the author made 
no attempt to find their explicit expression for larger values of k. 

If we let a subscript + indicate integration of the density function 
f(x i, Xt, xi) from 0 to *>, and a subscript—indicate integration from — to 0, 
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S H— ■ =BB rn: fix I, Xt, x t ) dxtdxtdxi, 
then the g< of (35) will be defined as follows 


?i =B /+++ 


?» ~ /—H- 


(40) 

g* = /++- 

g« * /" I - " 

?* *= /+— 1 - 

gr - /—+■ 


g« = /+— - 

g* = /— 

The coefficients may be written 



DC 11 = 2 (gi + gi)(gj + ge) 

DC 22 = 2 (gi + ga)(ga + Qi) 

DCm * 2(gi + gt)(gs + 54 ) 

(41) 

DC 12 = gaga + g<ga 

DCis = gaga + g*g7 

1 

1 

s 

$ 

1 

i 

1 

1 


DCaa = gaga + g«g? 

- gig* - gjg* 

where 




D = gig*gsg«f— + — + — + —^ + qtq»q7qt(~ + — + — + 

\q\ gs g> w \q* g« q? q»J 

+ 2(?6 + g#)(gx + <7s)(?i9s + g»g«) 

(42) 

+ 2(g» + g 7 ) (g« + 2s) (gigs + g*g«) 
+ 2(g 6 + g*)(g# + gr)(gig« + g»g») 
+ 8(gig«g«gr + g*g»g*g») 

(41) and (42) can of course be put in different forms by using the four relations 
.between the g<. The o B of (38) are defined in (35); for k = 3 they are 

a i - f_ f_ /( 0 » Xi,x»)dxidx 1 
a» — f f f(xi,0, x»)dxidx$ 

J—90 •*—SO 

as * £1 /(* 1 ,**, 0 ) dxidx». 


( 43 ) 
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5. The normal distribution in two dimensions. If the density function of 
the second section of the paper is normal 

(44) to,*) " 1/(2*WW)exp[- (ij - 2,55 + ij)], 

we find that the parameters of (26) are 

1 1 . 

1 

o» - 

V2ir<ri 

These give an interesting result—the correlation coefficient of the asymptotic 
distribution of the sample medians is 

2 ~ 

(46) Pm = - sin _l j» 

7T 

hence 

(47) |Pm|<|p| 

the equality sign holding only when p — 0 or ±1. 
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SAMPLES FROM TWO BIVARIATE NORMAL POPULATIONS 1 

By Chung Tsi Hsu 
Columbia University 

1. Introduction. In multivariate analysis involving p variates, or in analysis 
of variance of m samples from univariate populations, we are often interested 
in the hypothesis of the equality of variances; viz., that 

<ti = a 2 = • • • ** <r P , in the case of p variates; 
or 

<ti = <r 2 = * • • = <r m , in the case of m samples. 

As a matter of fact, it seldom occurs that these hypotheses are true, but the 
ratio between the variances might be known. 

Hotelling [5] has suggested that if 

al/ki = a\/k 2 = • • • = a i/i*. = <r 2 , 

where the fc’s are known constants, we can apply the transformation 

x[ = U) 1X1 , 

Xt = w 2 x 2 , 

X m W m X m , 

where 

wy/kt = WtVk'i = ... = Wmy/k^ - 1 , 

so that after transformation the variances become equal, i.e., 

i t r 

01 = <r* = ... = a m , 

and the required analysis can be carried out. This method is similarly ap¬ 
plicable in the multivariate case. 

In a previous paper [7], I developed a series of hypotheses concerning samples 
from a bivariate normal population under the assumption that 

(Ti 55 (Tj . 

In case a\/k x = <r%/k %, where ki and k* are two distinct known constants, 
similar results may be obtained by the use of the transformation x[ = w x x x ; 
x% = w%x %; where w x ^/ki = Wtx/Tci = 1 . 


1 Presented to the American Mathematical Sooiety at Washington, D. C., May 3,1941. 
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In multivariate analysis, the hypotheses usually of interest concerning correla¬ 
tion coefficients may be classified in two categories, vis., 

(i) that the correlation coefficient is equal to a specified value, e.g., in 
simple correlation pu = po, in partial correlation, pu.a = po, in multiple 
correlation, pj.« = po, or in correlation between two sets of variates 
[4]*, Q = Qo ; of special interest is the hypothesis of the vanishing of 
such correlation coefficients. 

(ii) that two given correlation coefficients are equal, e.g., (1) correlation 
coefficients pi and ps in the correlation matrix of a multivariate distribu¬ 
tion are equal (Hotelling [6]), or (2) the correlation coefficients pn and 
Pis in two bivariate populations are equal. 

R. A. Fisher in his earlier paper [3] introduced the transformation z = 

1 1 + r 

- log --which provides a very satisfactory, though approximate, method for 

2 1 — r 

the comparison of two correlation coefficients. Brander [1] treated the same 
problem by the method of the likelihood ratio criterion. 

The present paper is an attempt to obtain different criteria by the likelihood 
ratio method (Neyman and Pearson [9], [10], [11]) for testing, by means of 
samples, the equality of correlation coefficients in two bivariate normal popula¬ 
tions under the following sets of conditions: (1) ci = at and a[ — a t ; (2) <n = at , 
{, = and = fs. The results may be extended to the cases (3) 

a?/A m = a\/k 2 and ai /k'i = a't/k't ; (4) <r\/ki = al/kt , fi/fci = l-l/kt and a'x/k[ = 
a't/k't , l-'i/k'i = it/k'i , where the k’s are known constants. 


2. The hypotheses. Two samples, each being of two variates (xi, x t ) and 
(x [, Xt), of size N and N', are supposed to be drawn at random, respectively, 
from two independent normal bivariate populations, with the following distri¬ 
butions: 


( 1 ) 


( 2 ) 




2wcricr2\/l — p 2 


- ? exp [(‘‘tt ) 


where fi , ft , ffi , <rs, p; fi , ^2, vi , ^2, p' are the unknown parameters of the 
populations. 

The hypotheses to be considered in the present paper are: 

Hi : Assuming at = at and a[ = at , to test p = p'. 

Ht : Assuming at ■* at , fi = fe , and a[ * a't , «= {2, to test p *■ p'. 


1 See bibliography at the end of the paper. 
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The derivation and the distribution of the criteria for testing these hypotheses 
may be simplified by the following simultaneous transformations: 


(3) 

X - -4 (*« - *.) 
v2 

F * (Zi + **) 

V 2 

(4) 

x' -.-4 <*; - *) 

V2 

F' = (*( + x't) 

V 2 


( 6 ) 


2 wax 


The corresponding normal bivariate distributions in the transformed variables 
(X, F) and (X', Y') are obtained, viz. 

^ 2ir<rxffyV^ 1 — p XY ^ 2(1 — px Y ) K v * ) 

" *)(^-) + 

The conditions corresponding to 

(7) ci = C2 and c( = cj, 
are that 

(8) pxr — 0 and p XY — 0. 

Also, for a given p and p', we have from (7) 

(9) cy = 7<rx and a' Y = y'a'x, 
where 


(10) y = an d y' = 

1 - P 1 - p 

Following the notation of (9) and (10), the hypotheses H[ and H» corresponding 
to Hi and Hi are: 

H[ : Assuming pxr = 0, and p' XY = 0, to test y = y'. 

Hi : Assuming p xr — 0, ( - 0, and p XY = 0,$' = 0, to test y * 7'. 


3. The derivation of the criteria. Let (xu , xn)(xij , zjj) be the measurements 
of the characters on the ith and jth individuals in the two samples from their 
respective populations. After transformation, the corresponding measurements 
become (X<, F.) and (X/, F,-). Let p(E) denote the joint elementary proba- 
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bility law of the N and N' observations, E — (Xi , • • • , X K , Yi , • • • , Yn » 

xi, ...,x' N .,r, 

Following Neyman and Pearson, we shall use ft to designate the class of ad¬ 
missible populations under conditions which can be assumed to be satisfied in 
any case; and u to designate a subclass of ft under conditions which are satisfied 
only if the hypothesis to be tested is true. 

Thus for H', ft specifies for pzy — Pxr — 0, any real values of {, 17 , 17 ' and 
any positive values of a x , <r r , v'x , <r' Y ; u specifies pxr — Pxr — 0, any real 
values of {, 17 , n' and any positive values of <r Y and y which are defined by (9). 
While for H’, ft specifies pxr = pxr = 0 , £ = £' = 0, any real values of 17 and 17 ' 
and any positive values of a x , ay , <r’x , *r ;« specifies pxr — Pxr = 0, ( — (' = 0, 
any real values of ij and ij', and any positive values of <rr and 7 which are defined 
by (9). 

For our hypothesis Hi , the values of the parameters required to make p(ft) 
a maximum are: 


i — X, {j = T, ffx = , 

I' - X’, a' = r, & = 4, 


Thus p(ft max) 


W 8 uux"'*r 


v, = *r 

Oy — 8y . 


To obtain p(« max), let us define, according to the notation in the writer’s 
previous paper [7], 


n 2 F«iS 2 

R ‘ ’ .r+T! 


and 




2F , 8 1 's 2 

'» _J_ o'* 

Si + 82 


S* _ 1 + Ri , _ 8r* _ 1 + Rl 

s* ~ r^ir, u ~ 8*~ 1 - sr 


Then the values making p(«) a maximum are: 

| * X, fj = Y, <Ty = 

i' = X', ’ r - «rr* = *«?(* + w) 

and i is the positive root of the equation 

(N + N')y* - (N - N')(u - u')y - (N + N')uu' - 0 


or 

* _ (N - AT')(u - «') + V(W - W') s (u - u'f + 4(W + N')uu' 

* " 2(JV + JV') 


(ID 


yi. «ay. 
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and the likelihood ratio criterion for the hypothesis H[ is 
* _ p(« max) _ j~ 2 y^ 


i *• 


-ir-y* 


( 12 ) 


p(fi max) 


is r t r 2y^ 8 ; i 

u)8xj L(7l + U')tx J 


.(n + 

ravwrravwr' 


L7i + WJ LTi + w'J 

For //i, the values the parameters to make p(w) a maximum are: 


r-f', *i* = i sx* *; = «;. 


Thus 


p(12 max) 


/1 V +y ' VivF 
W (2Xy'\2X'y ' n »Ur 


Similarly, if we write 

ff _ 2ySiSt — i(xi — Xt) 2 „/ _ 2y'a[g'i — }(xi. — x t ) 2 

s\ + a\ + *(*i - *)” «? + «? + 

and 

_ Afar _ 8 2 k _ 1 + ft , _Ns'y 2 _l+ Ri 

V 2X* & + 2 2 1 - ft’ " = 2X' 2 1 _ R' t ’ 

the values to make p(«) a maximum are: 

ft = Y, a\ = ^ 2X*(f + v) 

f = ^=-21^ + ,;) 

a _ (N — N')(v - «/) + - 2V') s (t> - t/')* + 4(iV + AT')W 

(13) 2(iV + iV') 

- y», say. 


Then 


p(« max) = 


( 1V +W T 2ATVy* IT 2ATV7, T" 
\2t/ L(ya + t>)2X 2 J |_(y s + t>')2X' 2 J ’ 
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and the likelihood ratio criterion for the hypothesis H» is 
p(u max) 


X* 


(14) 


_ [ WNyisr IT 1 

p(Q max) L('Va + v) \Zs3T*J L(-y« + t>')VfPJ 

■0 


2y^TT2 vgT 

y* + pJ lyt + v'] * 


The case N = AT'. The above criteria Xi and Xi cannot in general be expressed 
simply, but when N ■» AT', by (11) and (13) 

7 i ■» y/uu’, yt “ Vw', 

and 

x _r ivw r x _r 4 -v/to' 7 

l(Vs + W) 8 J’ uvs + vwJ’ 

or we may express as monotonic functions of Xi and \%, 

^ 2 HN+N') ^ 1 /AT 4 


(15) 

Li 

(16) 

U 


' v*. /A: v ’ 


xi" 




Thus, A’s, V s, or their functions ^may be used as the criteria in the 
present case. 

Furthermore, if we introduce, 


(17) 

we have 


z = J log u, and z' = | log u', 
log J or ^ 




Thus Li can be written in terms of 2 and 2 ' 

(18) Li = 4/(c‘ ( '"*' ) + = 1/cosh 2 i(z - z') = sech 2 i(z - z'), 

and z — z' — w, say, may be used also as a criterion for Hi. 

We shall now proceed to obtain the distributions of some of these statistics. 

4. The distributions of u/u' and v/v'. Since 2Vs*/<r 2 and N8x/<r\ have inde¬ 
pendently the x distribution with JV — 1 degrees of freedom, 


u 


J 2 2 2 

Sr _ VrX2 7X2 
2 2 2 2 
Sr VxXi Xi 


and u/y has the F distribution with degrees of freedom fi = N — l, ft — N — 1. 
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Similarly, u'/y' = xt/xi lias the F distribution with the samenumbers of 
degrees of freedom (since N =* N', in the present case). 

If the hypothesis Hi is true (i.e., y — y') 


(19) 


u = xlxi* _ O'i0t 2i 

«' xJx? Mi 2*’ 


where 0<(—$x<) or is distributed as 


<*» &**<**• 

with Oi = i(iV — 1), and Zi(= oio t ), Zs(= fli^i) follow independently the Wilks' 
2 -distribution, [14], which we shall study in detail for the present case. 
Distribution of z when p — 2: Consider 

2 * BOidt • • • Op. 

Wilks has succeeded in integrating the distribution of z for the case p = 2 for 
special values of a’s, e.g., ai = \(N — 1), a* = i(N — 2). Now we want the 
distribution of z when p = 2 and for any values of o, and then for Oi = o* «= 
i(AT - 1 ). 

By (20) the joint distribution of 0\ and 6i is 


Ol'-'e-'OV-'e-' d0id0 t . 

r(oi)r(a») 

Applying the transformation z = BO ifl* ,#i ■ ft, the joint distribution of t*, 2 is 


Or (o.) Vl \W 


—*/U*j 


dvidz 


r(oi)r(oj) V1 v v 

Integrating t/i from Vi =* 0 to Vi = <», we have the distribution of z, vis., 


( 21 ) 


«t-i 


dz 


[ v ai ~ a, ~ 1 e~ v 

Jo 


-l.-vl-t/BPl 


dv i. 


B‘«r(a 1 )r(a ! .) Jo 

In order to evaluate the integral of (20), consider the transformation t>i — y 1 , 
dv i = 2 y dy, we have 


( 22 ) 


r 


i/ 


,2(01—«i) -1 Q-yt-BlBv* 


dy. 


To evaluate 7o for any o’s, by putting y = 1/x, dy «■ —dx/x* t we have 

« —gtfl/jj—l/ x t 


Consider 

(24) 


r(«i - 02 + i) _ f" j,. 
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Then 


IoT(ax - at + J) - 2 f c" < “‘ /i,+I/ ** , d* £ e^y^^dy 

>dv l 




e -tC./*+K)**+U««] fa 


*1 

e - 2 y/irB+v yOi- Oi-i dy 


*/5 + y 


Since by the substitution ~ + y = + V or V — ** + 2 ^/^x, ^ 

2^x + ~^dx and therefore 

/«r(oi - a, + l) = 2 \/t f e _2(v/ ^ +l) (x 2 + 2x dx, 

- jf e -2( v'*75+*> ^ + 2x 


(25) 


r(ai — 02 + i) 
Hence, 2 ? is distributed as 


(26) 


2\^ttz 


,a 2 ~i ^-2 vT75 


5 ° , r(a 1 )r(a 2 )r(o 1 - 02 + J) 




We infer from this distribution that when 2(ai — oj), i.e., the difference of 
degrees of freedom, is odd, the integral can be expressed as a terminated series; 
but for even values of 2(ai — at), the series is infinite. 

When B — ~, «i = |(J\T — 1), 02 = %(N — 2), (26) is reduced to 
A. 

( 27 ) 

f(oi)r(a2) 

which is Wilks’ ( distribution, [15], for p — 2. 

When 5 = 1 and ai = at = %(N — 1), it becomes 

(28) r«-*'(2v / i + x) H |ar“*«fa, 
r(oi)r(os) Jo 

which is the distribution of z involved in (19). 

Since (28) can apparently not be simplified, I have been unable thus far to 
find in manageable form the distribution of the ratio zi/z* and therefore of u/u' 
in this case. However, it would be simpler to use the alternative criterion 
w = z — z' for the hypothesis Hi . The distribution of w will be taken up in a 
later section. 
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The distribution of v/v'\ Since Ns\/a\ and XX*/c\ have independently the x 
distribution with N — 1 and N degrees of freedom respectively, therefore, 

NSj 
2 Z 2 


_ jya r _ <rVx I _ 7Xs 

is - * > 
VxXi Xi 


and 


v /N — 1 
y/ N 


y 

■■ N. 


has the f-distribution with/i = N — 1 degrees of freedom and 


v > /N — i 

Similarly j baa the F-distribution with degrees of freedom/! and/* 


as above. 

If the hypothesis H* is true (i.e., y = y'), 

2 '2 V 

V _ X 2 X 1 __ 0102 

*»' 2 '2 a J 

V XlX2 0102 


Zl 

f 

Zl 


where each 0, is distributed as in (19), but with ai *= %N and 02 = £(# — 1). 
We can infer from (27) that <1 = 4 y/z\ and k = 4 y/z* have independently 
the x 2 ~distribution each with 402 or 2 (N — 1) degrees of freedom, and ix/i% = 
y/Z 1 /Z 2 = y/v/v ' follows the F-distribution with degrees of freedom f\ = / a ■* 
2(N — 1). The 5% and 1% points of the F = v/v' may be obtained from 
Snedecor’s table ([12], p. 174). 


5. The distribution of y — log z. Wald [13] has suggested that the distribu¬ 
tion of z = J30102 * • ♦ 6 P for any o/s (i = 1, • • • , p) may also be obtained in¬ 
directly with the aid of the characteristic function. A similar method has been 
applied in a recent paper by Wald and Brookner [14], Consider the trans¬ 
formation 


(29) 


y = log t = log B 0 i 02 • • • 9 P . 


The characteristic function of y is 
= E(e ,v ) = 

(30) _ g‘r(ai + Qr(a* + t) ... r(ffl p -M) 

r(oi)r(o*) • • • r(a„) 

Thus the distribution f(y) dy is given by 


(31) 


m ■ 2 LC e ~ n ‘ p " u)dt - 


r(o<) 


dt. 


2iri JL i-i 

Without loss of generality, we may take Oi ^ o* ^ ^ a P > 0 and let 

a p + t — —t', then 


(32) /(y) = f e v, 'B *' 1^ r(a< - o„ - <') dt', 

In J-ap-i* t-i 

where c, * e a, "B~ ar j II r(a,). 
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The integration can be carried out by the method of residue along the contour 
C, bounded by the line x = —a p and that part of the circle with center at 
origin and radius r, which lies to the right of the line x * —a p . The integral 
of the function e*' y B~*' XI?—x r(o< — a P — t') along the arc converges to aero 
as the radius of the circle tends to infinity (Kullback, [8]). Hence the integrals 
along the vertical line x + a r — 0 and along the closed contour C are equal. 
Then we may write 

(33) f(y) - [ e vt ' B~*' ft r(o< - a, - 1 0 dt', 

and its value is c p times the sum of the residues at the poles within the con¬ 
tour C. 

For the present purpose, p — 2, we have 

m —affioo 

(34) f(y) = ~ / e*"'r(ox - a, - t')r(-<')<*('. 

We shall study the integral of (34) in more detail in the following cases: 

(i) ai — a* = }. By the duplication formula 

r(i - i')r(-t') = 2> w v '*r(-2i'), 


and the function 


r(-2i') = lim 


NUT“' 


;-£(-20(-2f' + l)... (-24' + N)’ 

has simple poles at the points 0, £, 1, 3/2, .... The residue at t' = m/2, 
where m is zero or a positive integer, is (—l)" +1 /2.m! and (34) becomes 


(35) 


f(y ) = v^*(l - 2e iy + 1,2V - 

= \/r c i e~ 2 ' il '. 

The distribution of z = e v is 


(27 bis) 


2y/ir 


0,-1 -2 


dt. 


r(oi)r(ot) 

(ii) 01 — <h — m + J. The function 

r(ox - a, - m-n - (m - * - 0(m - * - 0 ... (* - Or(J - w-o 

- 2 w, 'Vr(tn - * - tT)(m - # - tT) ... (* - 
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has simple poles at 0, m, m + m + 1, • • •, and 


f(v) * %/*■«*£, 

- 


1 


(2m — 1)! 

2*- 1 (m - 1)! “ 2**(2m) 


(2V) m + 


(2*«*)* h 


(2m - 1)1 


-If 

Dm 


2"(2m + 1) 

2-(2m + 2).2-l + 

1 l ( 2 , «T' hr/ *]. 


•] 


2 s *-»("i - 1)! 2" (2m + 7 ) 7 ! 

This agrees with the expansion of (26) when we put Oj — a* — $ *» m. 
(iii) 01 — a* « 0. The fimction 


[r(-f')] = lim 


(N\) i tr tt ' 


(-f)K-t' + l ) 1 • • • ( -t' + N )*' 
has poles of the second order at the points 0, 1, 2, 3, • • • and 

f(y) - C* i {(«' - 7)V'nr(-*ori«'-r 

(iv) ai — as = m. The function 

r(rn - t')r(-t') = (m - 1 - t')(m - 2 -«')••• (i - 0(-0ir(-0f 

> 

has finite simple poles at 1, 2, • • • , m — 1 and poles of the second order at m, 
m + 1, . • • , and 

f(y) - ct £ {« - yV'TOn - <0r(-< / )i«'-T 


7-0 


+ c, z (-£, «' - Y)V'*r(m - t')r(-t')} . 

7—m (CU ^ Ji'—7 


6. The distribution of w = z — 1 ' or ^ — cosh w. Since the distribution of u 
is given in [7] as 


(39) 


7B [i(N 




therefore, by transformation (17), we have that the distribution of z for a given 
f - 1 log 7 = i log !— is 

1 — P 


&i) 


sech" (z — f) dz, 


B 


(40) 



290 


CHUNG T8I HSU 


where » * N — 1. The distribution of z has been given by R. A. Fisher [3] 
for n a 1 and by Delury [2]. Similarly, the distribution of »' for a given f' is 


(41) 


B 


y^Wsech’V-f')*', 
\ 2 ’ 2 / 


where n' = N' — 1. 

In case n = n', the joint distribution of z and z' for a given common f is 

(42) Csech- (* - f) sech* V - ^ ( , ^ _ f) , 

where 1/C - 

By the transformation 2 = + 2 ')» w = z — z', we have the joint distri¬ 

bution of 2 and to, 

(43) 


C di dw 


2 n Cdzdw 


[cosh n (z — f) cosh" (z' — f)] [cosh 2(2 — f ) + cosh id]"’ 

Integrating with respect to 2 from — » to », we have 

dl 


2 n Cdw 


£ 


(44) 


[cosh 2(2 — f) + cosh to]" 

= 2 n Cdw f 


2 dz 


Jo [cosh 2(2 — f) + cosh to]* 
= 2 n Cdwl n , say. 

Applying the transformation </> = 2(2 — f), \// = cosh to, the integral of (34) 
becomes 

_tty _ 

Jo (cosh <f> -j- i^)* 

1 + ^ 

Substituting cosh $ + ^ = —-—, we have 


Jo \1 + */ 0 


de 


(45) 




(* + 




d0. 


(Comparing (35) with the hypergeometric function 
(46) I - £ *-*(1 - - tar dfl - r ^^ } ~ b) F(a, b, c, x), 



NORMAL POPULATIONS 


291 


we have b — n, e — b * §, a = J, and therefore (36) can be expressed in terms 
of a hypergeometric series as 

r _ r(n)r(j) 1 „A _ . ^ x + - i\ 

(47) u r<j+T) w+xr - T v- ” + *• *+v¬ 


d, — l 

The series (37) is convergent since ^ is less than unity. Thus the distri¬ 


bution of w, from (34), is 

, 4R v 2"Cr(n)r(|) 1 

r(» + i) (cosh w + 1)" 

and the distribution of ^ = cosh w is 

2* +1 cr(n)r(i) i 




-W-••!)< K 1, n • n + i • hri) d *- 


' ’ r(n + j) » + i)-H»-!)*■ v . r, '*+l/" T ' 

We notice that the distribution of ^ expressed in (39) is very similar to the 
r-distribution expressed in terms of hypergeometric series, except that in the 

yb — | J m 

first case the argument is , ---, while in the second case it is ; — ; —~ where 

^ + 1 1 + v 

p = pr. Hotelling [5] has obtained a very rapidly convergent hypergeometric 

series for the distribution of the correlation coefficient since | p | < 1. But 

for the distribution of we cannot obtain a more rapidly convergent series than 

(39), since the values of ^ lie between 1 and oo. 


7. Summary and remark. Two hypotheses concerning the comparison of 
correlation coefficients of two samples from bivariate normal populations have 
been considered. The appropriate test criteria for each hypothesis have been 
derived by the use of a transformation of the variates. The distributions of 
certain of the criteria have been obtained in the special case where N = N\ 
Incidentally the distribution of Wilks’ z for p * 2 and any values of ai and a% 
has been derived. 

Again though we assume throughout the paper that <n = <r% and cri — 0 * 2 , the 
tests can be generalized to fit the case where the ratios <n/<r% = ft, <r(/<r 2 = ft' 
are known, but are different from unity. In the latter case we can apply the 
transformation 

Vi = vhxi , y% » vhxt ; 

/ t t / t t 

2/1 = WiXi, y% = w t x% ; 

where 

wiki — Wiki * 1 , w[k'i = Wiki = 1 , 

so that after transformation the variances of each pair of y’s are equal. 

The writer is deeply indebted to Professor Harold Hotelling and Dr. Abraham 
Wald for their advice and suggestions in the preparation of this paper. 
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ON RANDOMNESS IN ORDERED SEQUENCES 

By L. C. Young 

Westinghouse Electric and Manufacturing Company 

It is frequently desirable to examine an ordered sequence of measurements 
for the presence of non-random variability, concern over any particular type of 
variability being limited. Unless the sequence is one containing replicated 
observations, current methods of analysis often restrict an investigation to 
tests for specific forms of variability, such as particular orders of regression and 
periodicity. In order to simulate replication, arbitrary grouping of data is 
occasionally used and followed by some test of variance; this practice, however, 
is likely to add an element of bias to the investigation. 

Under these conditions, it would be convenient to have the means of testing a 
series for the presence of general regression, before proceeding to test for that of 
a specific type. It is the purpose of this paper to present, as briefly as possible, 
a statistic designed for this preliminary type of examination, and to demonstrate 
its application. 

If a given sequence of measurements be denoted by 

X X , X % , • • • , X n 

£ ( Xi - x^y 
l - -, 

2 £ (x t - xy 

1 

will be dependent upon the arrangement of the n observations upon which it is 
based. C will have n! possible values for a given sample, corresponding to the 
number of permutations of n items. 

1. Moments of the distribution of C in terms of the moments of a 
finite sequence. Writing C in terms xi , • • • , x n , representing the devia¬ 
tions of Xi , • • • , X n from their sample mean of n measurements, 


£ (x< - *<+i) J 



1 

n—1 


x\ + Xn + 2 ZiXi+\ 
2±x*< 

i 
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then the magnitude of 

C - 
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In order to find the mean value of C for a given sample, it must be summed 
over all values obtained from the n! permutations of the measurements. 
Dealing with the numerator alone of the expression given above: 

53*> + x* + 2 $3 = 53* xj + + 2 51* XiXt+if 

where S p denotes summation over the n! permutations. 

There are n values of x,-, and n! arrangements. Each value x< is xi in 
(n — 1)! of the arrangements: the same reasoning applies to x n . The first two 
terms of the summation, therefore, will be 

J^pxl = ]£p* 4 » = (« - Di 

l 

With regard to the third term, there are 2 (n — 1) of such cross-products for 
each arrangement. Since the summation is taken over n! arrangements, x } x k 
will be different than x*x,, and should be considered a separate term. Each 

crossproduct term, therefore, must occur ^ times throughout the n! 

n(n — 1) 

arrangements, since there are n{n — 1) possible cross-products among n different 
items. The third term, then, will be 

2 XiXi+i) = 2(n - 1)! 53 2 = - 2(n - 1)! 53 

\ i / ii i 

from which it may be seen that the mean value of C is zero for any sample. 

The same method may be applied in order to find the second and higher 
moments of C. Squaring the numerator of the expression and expanding, 

+ X* + 2 XiXi+1 

t n -1 n— 1 /»—1 

x\ +x* n + 2x\xl + 4xl 2^ XiXi+i + 4a:* £ XiXi+i + 4( 22 x.*<+i 

Performing the summation 2 P term by term we obtain 

jj^i + *n + 2 2 2(2a - 3)^]C - 2n £ xj 

n! n(n — 1) 

whence the second moment of C for any sample is given by 

,, 2n — 3 — mi/ml 

iu 2 = —7-77- 

2 n(n — 1) 

where m* and are the second and fourth moments, respectively, of the n 
observations about their mean. 

In like manner, the third and fourth moments of the distribution of C for a 
given sample of n observations are found to be 
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-0 + 4(«-3)™l + 9^- 

o 

d j 


Mi 

= 7712 trig 

mj 


4«(n — l)(n — 2) 



Mi 

8n*(n — 1)(« — 2)(n — 3) \^ n 

3) 2 - 48n(4n - 9)—* 
m 


- 24n(3n 2 - 17n + 27) ^ + 

(8n* - 

45 n 2 - 23n 4- 210) 


m% 




+ 16(2» 2 + 5n - 21) + 

4(17n 2 

- 37n + 42) — ! 


m 2 


m\ 


- (7n 2 + 13n - 6) - { j 


2. Distribution of C for samples drawn from a normal universe. The 

first four moments of the distribution of C for samples drawn from a given popu¬ 
lation may be derived from the above formulae by substituting the mean values 

of —|—s, etc. of samples from such a population. For normal samples con- 
ms m s 

taining n observations, for example, the following mean values apply, as obtained 
by the method presented by R. A. Fisher [1, 2]: 


ml _ 

6(n - 2) 

IT 8 ~ 
m>2 

(« + l)(n + 3) ’ 

ra 4 _ 

3(n - 1) 

ml 

(n + 1) ’ 

ml 

3(3n 8 + 23n 2 - 63n + 45) 

1T12 

(n + l)(n + 3 )(« + 5) 

ra& Ws _ 

60(n - l)(n 2) 


(ft + l)(n + 3)(n + 5) ’ 

m 6 ___ 

15(n - l) 2 

7~3 ~ 
7712 

(ft + l)(w + 3) ’ 

m 8 _ 

105(ft - l) s 


(ft + 1)(» + 3)(» + 5)' 


Replacement of the sample moment ratios by the mean values of those ratios 
for normal samples yields the following moments of C: 

*■- („— 1 ’ix. + l) ' * , = 0> 

3(n» + 2w - 12) 

(n — 1)(» + l)(n + 3)(» + 5) 


Mi - 
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Compatible results for the case of normal samples have been obtained by 
Williams [3], using another method. 

From the above results, the value of 

_ 3(n 2 + 2n - 12)(n - 1)(« + 1) 

* (n-2)*(« + '3)(n+5) 

is seen to approach normality as the sample size is increased. 

Inasmuch as the distribution of C for normal samples is limited in both direc¬ 
tions and is symmetrical, it is apparent that the Pearson Type II distribution 
may be considered representative. Fitting this curve to the moments given 
above, the equation of the frequency distribution is given by 



where 

(n* - ft* — 13n 2 + 37n - 60) 

2(n* - 13n + 24) 

(n 2 + 2n - 12) (n - 2) 

(« 3 - 13ra + 24) ’ 

r(2m + 2) 
a . 2 *<»+i[r(m+ l)p‘ 

The values of ft for the distribution, for various values of n , are as follows: 


Sample size, n 


5 

2.300 

10 

2.570 

15 

2.684 

20 

2.750 

25 

2.793 

50 

2.833 


Due to the effect of even moments higher than the fourth, the approximation 
afforded by the Type II curve is not reliable for samples containing less than 
about eight observations. As the sample size decreases below this limit, the 
extremes of the C distribution deviate increasingly from the extremes (±o) 
of the fitted curve: with such a platykurtic distribution, therefore, the effect 
upon the lower significance levels vitiates the approximation. 

Although either ft or the theoretical limits of the distribution of C could 
have been employed as a parameter of the fitted curve, it was considered ex¬ 
pedient to use the former. In any case, of course, the advantage to be gained 
would be in connection only with samples containing few observations (less 
than eight). The evidence afforded by empirical sampling indicates that use 
of the limits as a parameter might render the approximation less valid. 


m — 
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In order to facilitate use of tin approximate distribution for samples of eight 
or more observations, the values of C associated with two probability level* are 
tabulated below in Table I. The ratio of each value of C to its standard error 
is also shown, to demonstrate the approach to normality. The significance 
levels recorded exclude 10% and 2% of the area under the curve, respectively. 
In most practical applications, these will be the 5% and 1% levels, respectively, 
since only positive values of C exceeding the tabulated value will ordinarily be 
considered significant. The tabulations were prepared from tables of the 
function I,(p, q) [5], where q «= .5 and p — m + 1, with the transformation 
. C* 


1 - V 

a* 


TABLE I 

Significance levels of the absolute value of C 


>le size, n 

P - .10 

C.m/iTt 

P - .02 

C.n/*t 

8 

.5088 

1.6486 

.6686 

2.1664 

9 

.4878 

1.6492 

.6456 

2.1826 

10 

.4689 

1.6494 

.6242 

2.1958 

11 

.4517 

1.6495 

.6044 

2.2068 

12 

.4362 

1.6495 

.5860 

2.2161 

13 

.4221 

1.6495 

.5691 

2.2241 

14 

.4092 

1.6494 

.5534 

2.2310 

15 

.3973 

1.6493 

.5389 

2.2369 

16 

.3864 

1.6492 

.5254 

2.2423 

17 

.3764 

1.6492 

.5128 

2.2470 

18 

.3670 

1.6491 

.5011 

2.2513 

19 

.3583 

1.6489 

.4900 

2.2550 

20 

.3502 

1.6488 

.4797 

2.2585 

21 

.3426 

1.6488 

.4700 

2.2616 

22 

.3355 

1.6486 

.4609 

2.2647 

23 

.3288 

.1.6485 , 

.4521 

2.2676 

24 

.3224 

1.6484 

.4440 

2.2700 

25 

.3165 

1.6484 

.4361 

2.2717 

ual (n = 

00 ) 

1.6447 


2.3262 


The distribution of C for normal samples containing 20 or more observations 
is sufficiently normal, for most practical cases and for the more common signifi¬ 
cance levels, to permit use of a table of areas under the normal curve, in conjunc¬ 
tion with the standard error <r c * ^ . The 5% significance levels 

shown in Table I result, at worst, in a one per cent error of probability estimate, 
if the normal approximation is used in their place: that is, if 1.6447 times the 
standard error is used instead of the tabulated significance level, the probability 
will be .0605 at most, for the values of n which are tabulated. 


The 5% significance levels 
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S. General discussion on the application of C. It may be wondered 
why the statistic C has been used, rather than the more easily computed statistic 

2 (x t - x i+1 y 

C' — ——— -. As far as a significance test is concerned, it clearly 

1 

does not matter which is used, since C and C" are linearly related. However, C 
may be regarded as symmetrically distributed about 0 in samples from a normal 
population to within at least four moments. Excessive departure of C from 0 
may be taken as indicative of the presence of non-randomness in the series, the 
actual significance test being based, of course, on the probability of obtaining a 
departure larger than a given observed one, under the assumption of a random 
series. Positive values of C, in general, correspond to positive correlation while 
negative values correspond to negative correlation between successive obser¬ 
vations. 

There are various ways of detecting non-randomness in a series of observations, 
such as regression methods, analysis of variance, etc. The use of regression 
methods implies that we must know in general the type of regression function 
to be tried. C is a very flexible statistic, on the other hand, for testing the null 
hypothesis that a series is random, no matter what the alternative hypothesis is. 
A thorough study of C as a statistic for testing the hypothesis of randomness in 
an ordered series should include a study of the power function of C for hypotheses 
specifying various types of non-randomness. However, we shall simply appeal 
to intuition in proposing the statistic C, and forego power function considerations 
in this note. In practice, the advantage of using C increases with the length of a 
series: lack of randomness in a single sequence of ten or less observations may 
ordinarily be detected by regression methods, in fitting a low order polynomial. 
In a longer sequence of measurements, on the other hand, the presence of com¬ 
plicated regression or of periodicity is often sufficiently obscured by variation 
to elude detection by any other than a flexible method. 

The statistic could be used to advantage in the field of applied statistics, in 
the investigation not only of variate series but of attribute series as well. For 
the latter purpose, an effort to tabulate the relationship between the level of 
significance and the percentage of either attribute would facilitate statistical 
investigation of random arrangement. A direct application could thus be made 
to binomially distributed attributes by a scalar assignment (0, 1) to the dichot¬ 
omy, followed by a procedure similar to that presented above. Similarly, the 
randomness of vectorial observations could be examined from the viewpoint of 
arrangement. The common method of treating such problems,—the “random 
walk method,”—has occasionally been found inadequate in dealing with specific 
forms of non-random order; this is especially true when the allocable cause of 
variation has a multi-directional effect. 

Needless to say, each of the fields of application considered so briefly above 
would require development before a routine, efficient method of investigating 
ordered arrangement could be established. Although probability level tables 
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have been provided in this paper for C as applied to normal samples,- it is quite 
evident that tables for samples from other parent distributions would be Deeded 
for some of the applications mentioned above. 

4. An illustration of the use of C. Although one example has already 
been presented elsewhere [4] in which the distribution developed in Section 2 
has been employed, a typical application of the statistic to an example in the 
field of quality control will be given here in order to illustrate the mechanics of 
solution. The data presented in Table II represent the percentages of defective 
product turned out daily, over a period of twenty-four days, by a single workman. 
The total output each day closely approximates five hundred parts: this fact is 
brought out to explain the calculation of x for the observed series of percentages, 
—it has no bearing upon the use of C. 

TABLE II 

Percentage of product rejected 


Day 

%,X 


d* 

1 

7.4 

54.76 


2 

8.8 

77.44 

1.96 

3 

11.4 

129.96 

6.76 

4 

10.3 

106.09 

1.21 

5 

11.9 

141.61 

2.56 

6 

12.2 

148.84 

.09 

7 

10.0 

100.00 

4.84 

8 

8.4 

70.56 

2.56 

9 

9.4 

88.36 

1.00 

10 

10.9 

118.81 

2.25 

11 

9.9 

98.01 

1.00 

12 

11.8 

139.24 

3.61 

13 

10.0 

100.00 

3.24 

14 

8.9 

79.21 

1.21 

15 

9.7 

94.09 

.64 

16 

9.3 

86.49 

.16 

17 

12.0 

144.00 

7.29 

18 

12.3 

151.29 

.09 

19 

10.3 

106.09 

4.00 

20 

8.6 

73.96 

2.89 

21 

10.4 

108.16 

3.24 

22 

11.1 

123.21 

.49 

23 

9.4 

88.38 

2.89 

24 

8.2 

67.24 

1.44 

Totals 

242.6 

2495.82 

55.42 


nX* 2452.28 
2x a - 43.64 

C = .3636 (significant) x = 21.518 (23 degrees of freedom) (not significant). 
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The value of C derived from the data lies between the two significance levels 
tabulated in Table I; there is reason to believe that the data are ordered, or non* 
random. Computation of however, has been carried out with the hypothesis 
that all product was made under the same conditions (i.e. with a percentage 
defective equal to 10.108%, the mean of the group). The value so obtained is 
associated with a probability of about P = .50: the hypothesis is not disproved 
by this test. In short, the variability of the twenty-four observations could be 
considered random if it were not for the order of their arrangement. 
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ON CERTAIN LIKELIHOOD-RATIO TESTS ASSOCIATED WITH THE 
EXPONENTIAL DISTRIBUTION 


By Edward Paulson 
Washington, D.C. 

Various likelihood-ratio tests and their distributions in samples from a popula¬ 
tion having the elementary probability law B < x < <», have been 

<T 

studied by Neyman and Pearson [ 1 ] and Sukhatme [ 2 ], In this note the power 
functions and the question of bias of several likelihood-ratio tests will be in¬ 
vestigated. The exponential distribution appears to be appropriate for dealing 
with problems involving the intervals of time between events which tend to be 
random, as for example the interval between consecutive telephone calls, or 
the interval between consecutive accidents to the same worker. 

To test the hypothesis H' that the location parameter B is equal to some 
fixed value, it being assumed that the scale parameter a is known, we can for 
simplicity take the set il of admissible populations from which the sample might 
have been drawn to be {— oo < B < + <r = 11, while the subset w from 
which the sample must come when the hypothesis is true is [B = 0, <r = !}. 
Then the likelihood-ratio Xi for testing this hypothesis is 

__ P(o) max.) 

Ai . . 

P(il max.) 


- 2 

e *-1 


- 2 
e 1 


where xi is the smallest observation in a random sample of n. The region of 
acceptance of this hypothesis consists of all points in sample space for which 

Xi, < Xi < 1, 

where Xi, is chosen so that / 0 i(Xi) dXi = 1 — a, a being the level of significance 

used and g (Xi) dXi being the distribution of Xi when B is really equal to zero. 
The region \ u < Xi < 1 is equivalent to the region in the sample space for which 


0 < xi < k %; ki 


log \u 

n 


For any value of B the distribution of X\ is known [3] to be 

0 i(si) dxi = dx i. 
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Setting B —■ 0 , the relationship between k\ and a is 
ne - ** 1 dxi — 1 — a, so 


c 


—nJbi 

e = a. 


When B < 0, the power function P(B), for this test is 

P(B) = 1 - I* 1 dx i = l- e"“[l - «]. 

Jo 


/* 
J /I 


716 


r»(ari—a) 


dxi = ae n *. When B > k Xy 


When 0 < B < fci, P(B) = 1 
P(£) - 1. 

Since e nB > l U B > 0 and also e nB < 1 if B < 0, P(B) is obviously > a if 
B 9 * 0. This test is therefore completely unbiased in the sense of Daly [4]. 
In addition, it is not difficult to prove that this test has the unusual property 
of being a uniformly most powerful test with respect to all alternatives. 

To test the hypothesis H" that the location parameter is equal to some fixed 
value, say B = 0, when the scale parameter v is unknown, the likelihood-ratio 
is easily seen to be 

X) ( Xi - Xi) 


X 2 = 


X i 

**1 


1 


1 + 


nxi 


2 (*< - * 1 ) 


i-i 


The region of acceptance consists of all points in the sample space for which 

At« < A* < 1 where / 0 i(A a ) d\ 2 = 1 — a. This is equivalent to the region 
J x 2c 

(1 - \ii n ) 


*, = (»- 1 ) 


\ii n 


(1) 0 < ^« t < fc 2 ; 

it (Xi-xi) 

l_i-i 

The relation between kt and a is easily found from the distribution of t when 
& — 0, which is known to be [3] 

MO dt - r - ■ 

L' + ^J 

Therefore £ Ml) dt — 1 — «, so Tl + - ^ 

It is somewhat easier to find the power function of this test by considering the 
region of acceptance as made up of points in the x\, s plane for which 


|—(n— 1 ) 


0 < xt < ^ where a 
n 

which is identical with the region in ( 1 ). 


it & - *0 

i-i 

n — 1 
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The joint distribution of Xi and «is [3] 

fi(xi, s ) dxi ds — <l>t(xi) dxda, 

where 


and 


<t>i(xi) dxi = - e - "^ 1 B)lr dx i 


(”_ 1 Y ‘.-v 1 - 1 ”"* 


* (,) *- (n —2)1 

When B < 0, the power function P(B) of this test is 

P(B) = 1 — jf ds £ ,f/ " f,(*!«) dxi - 1 - e nJ,/ '[l - a]. 

When B > 0, the power function is 

fki§ln 


( 2 ) 


/•«» /»#2 »/n 

P(5) = 1 — / ds t\(xi, s) dx i 

J Bnfk t Jb 


= ce"** + 7 


r [n — 1; 


n(n — 1)B"1 „ B /» 

J 




where I[p; x] = 


r«(p) 

T(p) 


J x p ~ 1 e~ z dx 


which is the form in which the Incomplete Gamma Function has been tabulated 
[5]. 

Since cr must be positive, e nBI ’ < 1 if B < 0 and therefore P(B) > a in the 
interval — » < B < 0. To show that P(B) is > a in the interval 0 < B < », 
it is simpler to work with the expression for P(B) as a double integral in (2), 
than to differentiate the power function directly. Performing the integration 
with respect to h , 

P(B) = 1+ r [<r ( ***- J,B) " - 1].*(«) ds. 

Differentiating with respect to B, 

P'(B)=r -e-^’-^'Ms) ds. 

J n«/l. a 


The integral expression for P'(B) is obviously positive. Therefore since for 
B > 0 the derivative is always positive the function must be monotomically 
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increasing in this interval (0 < B < + »), so P(B) is > a when B > There¬ 
fore this test is also completely unbiased. dtk: 

We now consider the hypothesis H'" that two sampler are drawn Trom ex¬ 
ponential distributions with the same location parameter, assuming it is known 
the samples must have come from two exponential distributions with the same 

scale parameter. Given a sample of values of x drawn from - dx 

<r 

and another independent sample of values of y drawn from - e~ (v ~ 8j)/< ' dy, the 

CF 

hypothesis wc wish to test is that Bt = Bi. Let xi be the smallest of the ni 
values of x and y x be the smallest of the n* values of y, let L be the smallest of 
the «i + n* = N values of both x and y. Then the likelihood ratio for this 
hypothesis is 


Xs 


*** n ng “ 

£ (Xi - Xi) + £ (yi - yi) 

»-l *-l 

AT 

r i i 

il) (xi - L) + (yi - L) 

_t-1 


-1 

I * i s 

+ 

1 vH 

_l 


where 


2 = Ml/i - x x ), 
= rii(x 1 - yi), 


if V\ > xi 
if Xi > y x , 


and 


<-1 




The region of acceptance, X», < A* < 1, is equivalent to the region 0 < Z < K 3 u, 
where K t is again a function of a, the level of significance, the exact relation 
being 

f*‘ (N — 2) dt . 1 

(1 + t)*+ 


f 


1 - a, 


so 


= a. 


(1 + ki ) N ~ 2 

It is known [3] that u is independent of Z, and that its distribution is 


4 >b(u) du 


u N ~ z e- u, 'du 
N - 3)P 


The distribution of z is somewhat complicated; but it can be derived by observ¬ 
ing that the probability that z lies in any infinitesimal interval z\ dt \ dzi is 
the sum of the probabilities that n*(?/i — xi) and ni(xi — yi) lie in that interval 
and by then using standard methods for finding the distribution of the difference 
of two variates. For the case G = 5 a — Si > 0, the distribution f(z) of z is 


(3) 


fi(z) dz 
/*(«) dz 


e 


~niOl* 


[n ie »../n,. + ^ o < 2 < n*G, 

ntO < 2 < oo. 


(»i + «*)<r 
[meV a * + n t e~ nialc ]e-‘ lt dz 


(»»i +• «j)<r 
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For tjj^e &te 0 < 0, thiSdistribution of z can be derived from (3) by interchang¬ 
ing m s^^j.and putti^i—<7 in place of G. 

The power functionnPthis test can now be derived. For the case G > 0, 
the power function P(G) is 


P(G) - 1 


(4) 


{or 


0*tfO/k$ i»»t O 

/i(«)0s(u) dz - / du fi(z)4*(u) dz 
Jo Jh t u 


+ [ du f fi(z)<fa(u) dz). 
J%,a/h, J»,o ) 


Upon integrating out and simplifying, the power function becomes 


P(G) 


a(*L 

\n, • 


«j + m 


) +/ [*- 2 '£0 


-njO/w 


Tii n* 


(. 


G(n* — Tiifcj) 
h<r 


The power function when G < 0 is easily derived from that for G > 0 by every¬ 
where interchanging ni and n* and substituting —G for G. 

To show that P(G) > a when G s* 0, it is only necessary to show that the 
derivative P'(G) of the power function is always positive when G > 0, and id- 
ways negative when G < 0. It is again considerably simpler to use the expres¬ 
sion for P(G) as a double integral. For the case G > 0, integrating with respect 
to z in (4), 


P(G) 


nt 


fit -I - nt 


[1-e 


-G(ni+n a )/«-i 


+ 


I 


n,0/fc| nie nxGlv 


nt + nt 




(nte n » g/ ' -I- n*e- nial ') 
f o/*i n\ + th 


■l-e-^oMdu, 


where Lf(x)]o = f(b) — /(a). Upon differentiating and simplifying, 

P'(G) « _ e -*.«/^( u ) du 

{til + Th)* Jo 


+ r e~ i,M/, [e n,a/r - e~ nialr \4n(u)du. 

(nt + n»)<r Jn t o/k, 


Both integrals are easily seen to always be positive, so P'(G) is positive when 
G > 0. In the same manner it can be shown that P'(G) is negative when G < 0. 
Therefore this test is also completely unbiased. 
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The question of investigating the bias of the likelihood-ratio tests for (a) 
testing the hypothesis that c = <r 0 when B is known and (b) testing the hy¬ 
pothesis that <r = <r 0 , nothing being known about the value of B, are practically 
identical with the analogous problems for a normal distribution. The results 
are also the same, for the X test for (a) is completely unbiased, while that for 
(b) is biased. 
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ON THE MATHEMATICALLY SIGNIFICANT FIGURES IN THE 
SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 

By L. B. Tuckebman 
The National Bureau of Standards 

1. Introduction. The number of mathematically significant figures in the 
solution of simultaneous linear equations has received attention from a number 
of writers [1-6]. It is an important subject, not only in least squares and 
correlations, but in many other problems of science where simultaneous equa¬ 
tions arise: it may not be amiss, therefore, to examine it from a fresh start, 
particularly since (as will be shown) some of the rules that have been published 
on it fail in certain frequently occurring circumstances. 

2. Definitions. Before proceeding into the subject it will be necessary to dis¬ 
tinguish between the computer's terms “significant figures” and “determinate 
significant figures.” The former are the figures that compose a number, without 
the consecutive ciphers that precede or follow them, merely to locate the decimal 
point. “Determinate significant figures,” on the other hand, are figures that 
are justifiable on computational grounds. From the computer's point of view, 
the number of significant figures remains independent of what is statistically 
significant. To avoid confusion in what follows, the term “significant figures” 
will be used in the computer's sense, and the adjective “determinate” will be 
supplied where mathematical determinacy is implied. 

To avoid prolixity the term “observational error” will include any uncertainty 
arising either from errors in the observations or from the statistical nature of 
the problem (e.g. sampling errors, grouping errors, e£c.). The observational error 
of the result is independent of the particular sequence of computation followed and 
the accuracy with which it is carried out . 

The term “computational error” will include all the additional uncertainties 
arising from the approximations occurring in the particular sequence of computa¬ 
tion used, including the “rounding off” of the final result. The computational 
errors , unlike the observational errors , depend in general upon the sequence of the 
intermediate steps used in the computation as well as on the number of significant 
figures to which they are carried . 

3. Criterion of an adequate computation. If the number written down at the 
end of a computation is to serve its purpose the maximum possible computational 
error must be suitably limited. 

A decimal representation of a number containing /significant figures is subject 

807 
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to an uncertainty (upper limit of absolute error) of 5 in the (f + l)th place. 
It has, therefore, a possible relative (not absolute) error of representation some¬ 
where between 5 X l(T (/+1> and 5 X 10“" / , in magnitude. This relative compu¬ 
tational error sets the limit to any valid final rounding off. Regardless of the 
accuracy to which the intermediate steps of the computation have been carried, 
this relative computational error introduced by the final rounding off alone 
must be suitably limited. 

In case all of the accuracy obtainable from the data is not needed in the result, 
the sum of the maximum possible computational error (including the error of 
the final rounding off) and the maximum possible observational error must be 
kept below the error which can be tolerated in the result. 

In case all of the accuracy obtainable from the data is needed in the result, 
the maximum possible computational error in the result (including the error of 
the final rounding off) must be negligible in comparison with the uncertainty 
(observational error) in the result arising from uncertainty in the data. Just 
how small a fraction of the observational error is “negligible” is necessarily a matter 
of judgment, and will depend upon the nature of the problem. A computational 
error that would be wholly negligible in some ordinary computations might be 
intolerably large in the adjustment of an accurate geodetic survey. In any case 
the only basis for a valid judgment of the adequacy of the computation lies in a 
comparison of (i) the maximum possible computational error that can arise in 
the sequence of computations including the final “rounding 011 ,” with (ii) the 
observational error of the result arising from the observational errors inherent 
in the data. 


4. Propagation of error in a system of linear equations. Assume that 

(1) XI 55 s ~ 1» 2, • • • , Tl, 

t 

is a set of simultaneous linear equations derived in some way from observations 
and in which the coefficients a ti and the absolute terms b, may all be subject to 
observational error. If the relative (not absolute) observational error of a 
quantity q be represented by 8 g it may readily be seen that 


( 2 ) 



~“iC fak/Xf)Akfahktakk + 2 ( b 9 /Xj)A t j8b 9 

k k » 

h k 


where A is the determinant of the coefficients am, and Akk is the term corre¬ 
sponding to am in the reciprocal (not the adjoint) determinant. 


5. Upper limits to observational errors. The sign and magnitude of the 
relative errors 5am and 66, are unknown, but we shall assume that it is possible 
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in any problem to assign to them upper limit* 

| Sa u | and | 46, | 

which in magnitude they cannot exceed. If the problem is such that the values 
of each of the Sau, and the 46, are wholly independent of each other, it is then 
possible that their magnitudes may all reach their upper limits | 4a** I and 1 46, | 
simultaneously, in which case upper bounds of 4xy and 4A may be placed at 

|«*y| = EE I (xk/xjlAkjau, 114a« | + E I (b./xi)A^ 1146, | 

(3) * * 

14A | «= 23 E I -d**a** 11 Sau I 

h k 

6. Indefiniteness of file problem in fixe general case. The values of the Sau, 
and 46, may not be independent of each other, in which circumstance knowledge 
of the law of their dependence would make it possible to assign upper limits to 
the magnitudes of 4x, and 4A. These upper limits can not be larger than the 
upper bounds shown in equation (3), and in special cases they will be much 
smaller. Since the dependence of 4a** and 46, may in general have any form 
whatever, cases can and will occur in which the upper limits of the relative 
errors of 4*,- and 4A may have any ratio whatever. 

7. Case of independent errors. Any general discussion of the errors that can 
occur in x, and A must be based either on some special assumption or on the 
limiting assumption that the errors are independent. It is this latter assump¬ 
tion that underlies the usual discussion, and will be the basis of what follows. 
Equation (3) gives the upper limit to the 4x, and 4A under these assumptions. 

8. The ratios of 16x,-1 and 14A | are still indefinite in spite of the assumption 
of independent errors in fixe coefficients. However, equation (3) does not deter¬ 
mine any definite ratio or inequality between the upper bounds j 4x, | and | 4A |. 
The nature of the observations may be such that some of the errors in the a** 
and 6. are very small and some relatively large. Not infrequently it is safe to 
assume that some of them are free from appreciable error and to ascribe all the 
error of the x, to the error in one or two of the a** or 6,. If any statement of a 
definite relationship, either as an equality or an inequality between | 4A | and 
the | 4x, | is valid for all possible sets of linear equations, it must at least hold 
in the special case in which the errors of all the 6, and the errors of all except one 
of the aut are negligible. 

If such a statement of a definite general relationship between these upper 
limits of errors can be made, it must be possible to write down an equation or an 
inequality between any one of the expressions | | and some or all of the 

corresponding expressions | (x*/x,-)A*,-1 , j = 1, 2, • • • , n, that will remain true 
no matter what be the values of the a** and the 6, in the original set of simulta¬ 
neous equations. It is obvious that the ratio of | lu | and | ( xd/x^Am |, 
(j * k), depends upon the values of the a**, and sets of equations can be found 
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to give any assigned value to that ratio. It is therefore impossible to state any 
rule that will restrict the ratio of the relative error of A and the relative error 
of any one of the x ,-, valid for all possible sets of linear equations. 

9. Definite statement about the sum of die relative errors in the unknowns. 

However, in the summation £ I faj | there occurs the term corresponding to 

j = k, for which | (xk/x,)A\j | = | Am, |, so that under the assumption that the 
dkk and 6, are independent sources of error, we may write the inequality 

(4) £ | fa* | < | «A | 

i 

which states that the sum of the upper bounds to the relative errors of all the Xj 
cannot be less than the upper bound to the relative error of the determinant A. 
A corresponding statement can easily be proved for the standard deviations. 

A limiting case can be constructed in which the inequality (4) reduces to 

(5) £|ft*|-|U| 

j 

and in which all of the | bxj | are equal. For this case, 

(6) | 6 A | = n | bXj | for all values of j. 

If n < 10 it is obvious that there will be at least one more determinate signifi¬ 
cant figure in each of the x ,• than in the determinant A of the coefficients. 

* It is frequently assumed that the number of determinate significant figures in 
the solution for any unknown cannot exceed the number of determinate signifi¬ 
cant figures in the determinant A of the coefficients. We see now that this state¬ 
ment can not be generally valid, even under the assumption that the a** and b» 
are independent sources of error. As a matter of fact, it is necessary in some 
cases to compute some or even all of the unknowns to more significant figures 
than are determinate in the determinant A of the coefficients, if one would retain 
in the result all the accuracy that is obtainable from the data. 

Cases in which the relative observational error of every one of the unknowns 
is less than the relative error of the determinant A probably occur rarely in 
practice; in fact the only ones that I have seen are those that I constructed 
purposely to show that such a thing is possible. Jlowever, cases in which the 
relative errors of one or several but not all of the unknowns are much smaller 
than the relative error of the determinant A, occur fairly frequently. 

10. Remarks on the case of “near indeterminacy.” The major interest in 
curve fitting centers around the condition of “near indeterminacy,” i.e., of a 
small or near vanishing determinant A. Even in the circumstance where the 
relative error of the determinant is much greater than the relative error of some 
or all of the coefficients and absolute terms, the relative error of one or more of 
the unknowns may be much smaller than the relative error of the determinant, 
as may be seen from what follows. 
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In accurate experimentation the endeavor is, wherever possible, to arrange the 
experiment so that the quantity sought comes directly from the measurement as 
represented by an equation such as 

(7) x - p. 

However, so ideal an experimental arrangement is rarely if ever possible, and it 
is a common experience to find that the measurements are represented by an 
equation such as 

(8) x + qy + rz + su + • • • = p, 

where qy, rz, su, etc., are small corrections that must somehow be evaluated. 
For simplicity, the discussion will be confined to the almost trivial case 

(9) * + qy = p. 

Not infrequently the only way the correction can be evaluated is to rearrange the 
conditions of the experiment so that another equation is obtained in the form 

(10) x + q'y - p'. 

Sometimes the nature of the experiment is such that it is not possible to change 
the coefficient of y by more than a small amount, under which conditions 

(11) q' = 9(1 + 0 ), 

and 

(12) p' = p(l + «), 

where 0 and a are small in comparison with 1. The solution of equations (9) 
and (10) now gives 


(13) 


P 9 

p' 9' 
1 9 
1 q' 


Vj/_ - tl 
q' ~ 9 


P(1 “ a/0). 


The quantity q f — q seen in the denominator of this equation is the determinant 
A of the coefficients, and by equation (11) its value is /3g. Since 0q is assumed 
to be small here, the solution for x encounters a near vanishing denominator. 
It would, however, be wrong to assume that the number of determinate signifi¬ 
cant figures in x that can be obtained by solving the equations is necessarily 
limited to the number of determinate significant figures in the denominator A. 

If the experimenter has been fortunate in finding suitable experimental condi¬ 
tions, the denominator A = 0q, although small in comparison with either q f or q y 
will still not cause difficulty. It will be observed that the coefficients of q f and q 
in the denominator are equal (both being unity). Now if the coefficients p 
and p f in the numerator are nearly enough equal, so that q f and q occur in both 
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numerator and denominator so nearly proportionally that the uncertainties in 
q and q' produce nearly compensating errors in both numerator and denominator, 
then x will be given to more determinate significant figures than are found in 
the denominator A. It can then be said that the experiment is successful in 
evaluating the correction term qy in equation (9). 

On the other hand, in less fortunate circumstances, to the exasperation of the 
experimenter, the denominator A = q' — q = fiq is not only small, but p' and p, 
although still nearly equal, differ enough so that the errors in q' and q are not 
compensated by the nearly equal coefficients in the numerator. The experiment 
will then fail to improve the approximation p for x by failing to evaluate the 
small correction qy in equation (9). This would be an inherent defect in the 
experiment and could not be removed by any manner of computation. 

The same conclusion would of course be drawn from the coefficient of p (viz., 
1 — a/0) at the extreme right of equation (13). It is not the size of 0 that 
alone determines the number of determinate significant figures in x, it is rather 
the ratio between a and 0 . In the fortunate experimental circumstances de¬ 
scribed above, the near equality of p' and p offsets the near equality of q' and q 
by reducing the term a/0 to a value small compared with unity; the term a/0, 
being small, acts to reduce the effect of the uncertainties in q and q' (i.e., in q 
and 0 ) in the evaluation of x. On the other hand, in less fortunate circum¬ 
stances, the correction term a/0 can not now shield x from the uncertainties in q 
and q' since the relative difference a between p and p' is not small enough to 
reduce a/0 to innocuity. 


11. Numerical illustration of compensating errors. As a “horrible example” 
especially constructed to emphasize the theoretical possibilities, take the fol¬ 
lowing special case— 


J lOOO.lOOOOx + lO.OOOOOp = 1010.10000 
{ 1000.00000X + lO.OOOOOp = 1010.00000 


wherein it is assumed that the coefficients and the absolute terms (assumed to 
be derived from the observational data) are all correct to the fifth decimal place 
as given, and no closer estimate of their errors is possible. So far as known, the 
upper limit to the absolute observational error of each is then the same, i.e. 
5 X 10 - *, but the coefficients of x (an and an), and the absolute terms (bi and W), 
all have nine determinate significant figures, while the coefficients of y (an 
and att), have only seven. Thus, 

| ton | > 5 X 10"’, | tot | > 5 X 10"*, | Sbt | > 5 X 10"*, 

| Sbt | > 5 X 10"’, 


but 

(15) 


| Sa a I > 5 X 10 -7 , | San | > 5 X 10~ 7 , 
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and * * 1, y — 1, A => 1, whereupon a substitution of values from (15) into (3) 
gives the inequalities 

(10) | 6 x | > 3 X 10" 4 , | Sy | > 3 X 10~ s , | S A | > 1.01 X 10~*. 

So far as known, the determinant A may thus be in error by as much as 1 per 
cent, and y by as much as 3 per cent, yet x is known closer than l/30th per cent. 
Here the value of the unknown x cannot be adequately represented by less than 
four significant figures, and might even require five, in spite of the fact that 
neither A nor y requires more than three significant figures to represent all that 
is certainly known about them. 

The reason for this disparity in relative errors can be more easily seen by 
substituting numerical values for all the coefficients in the expression for * 
except a a and an . The possible relative errors of on and are, as noted 
above, about 100 times as great as the possible relative errors of Ou , an, bi, 
and bi , and are the controlling errors in A. In the solution 

fl7) _ 1010.10000a* - 1010.00000a w 

1 ' x lOOO.lOOOOo*, - 1000.00000aii ’ 

however, both a a and an occur in both numerator and denominator, and more¬ 
over the coefficient of each in the numerator is nearly equal to its coefficient in 
the denominator, so that a change in either Ou or an changes both numerator 
and denominator nearly proportionally, with the result that their ratio x is 
known much more accurately than either the numerator or the denominator A. 

This kind of compensation of errors in a computation is not confined to the 
solution of simultaneous equations (and it is not an infrequent occurrence in 
other computations). This is one of the many reasons why it is impossible to 
give general rules for the retention of significant figures that will be valid for 
all types of computations. 

12. Geometrical analogy. Moulton [4] illustrated his reasoning by the fol¬ 
lowing geometrical analogy. The solution of three linear equations is equivalent 
to finding the point of intersection of three planes. When the determinant of 
the coefficients is small in comparison with the coefficients themselves, these 
planes are either nearly parallel, or the line of intersection of any two of them 
is nearly parallel to the third. In these cases small uncertainties in the location 
of any one of the planes correspond to large uncertainties in the position of their 
point of intersection. 

In the first circumstance the planes might all be nearly parallel to one of the 
three coordinate planes, with the result that large uncertainty would affiict the 
value of the determinant and two of the unknowns, the third being much more 
accurately determined. 

In the second circumstance, the line of intersection of two of the planes might 
be nearly parallel to one of the coordinate axes. When that happens, large un- 
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certainty will afflict the value of the determinant, but only one of the unknowns, 
the other two being much more accurately determined. 

This geometrical analogy can be extended to cover simultaneous equations 
with any number of unknowns. Near-vanishing of the determinant A of the 
coefficients necessarily implies relatively large uncertainties in the determinant 
and also in at least one of the unknowns, but not necessarily in all of them. 
These are, of course, very special cases, but, as noted above, they are of frequent 
occurrence in actual problems. 


13. Evaluation of computational error. The relative computational error in 
Xj must be kept within certain definite limits which depend upon the particular 
problem to be solved (section 3). To do this it is necessary to be able to calcu¬ 
late an upper bound to the relative computational error inherent in any particular 
sequence of computations. 

In many computations it is easy to write down a simple formula that will set 
an upper bound to the relative computational error involved in that particular 
sequence. This formula contains numbers fi , / 2 , / 3 , etc., each representing the 
number of significant figures accurately computed at some particular step. 
Once a simple formula for relative computational error is written down, it is 
easy to choose values of /i, fa , /a, etc. that will give an upper bound to the 
relative computational error not larger than the permissible limit of maximum 
possible computational error outlined in section 3. This method of determining 
an upper bound of the relative computational error should be used whenever such 
a simple formula can be found. For example, to compute x from equation ,(13) 
we may use the following sequence: r x = q' — q , r 2 = n/q = /3, r 8 = p' — p, r 4 = 
r*/p = a, r 6 = r 4 /r 2 = a/P, r 6 = 1 - r 6 = 1 - a/p, r 7 = pr 6 = p(l - a/p) = x. 
x may then be written as a function of these partial results, viz.: 

x = r 7 = vn = p(l ~ n) = p(l - r 4 /r 2 ) = p(l - r s /pr 2 ) 

(18) 

= p(l - nq/n). 

Applying first order error theory we find 


I «(*) I £ , /P /R | {I «<ri) I + I .(*) | + | c(r 8 ) | + | i(r 4 ) | + | <(r 6 ) |} 

(19) 

+ I «(r«) | + | «(r 7 ) j 


where c(r<) represents the relative error in r, arising from the computation by 
which Ti was determined from the preceding partial results, n , r 2 , • • • , r t -_i, 
and e(x) is the total relative computational error in x when so computed. It 
is easy to keep c(x) within any desired limits by suitably limiting each error term 
of (19). Since a computation accurate to / significant figures involves a relative 
computational error not greater than 5 X 10 -/ , any desired limits can then be 
set to each error term of (19) by a proper choice of the number of significant 
figures that should be carried in that step. 
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Unfortunately there seem to be no reasonably simple formulae for determining 
upper bounds of the relative computational errors that arise in the solution of 
simultaneous linear equations in more than two variables. This does not ab¬ 
solve the computer from the necessity of ensuring that his computational errors 
are suitably limited. 

The method I have found most economical is to carry the solution of simulta¬ 
neous linear equations to the capacity of the machine, and as each partial result 
Ti is obtained, write it as 

fi(l =t «<), 

where r* is the value actually found and €< is a positive number representing the 
accumulation of uncertainty introduced by all preceding steps in the computa¬ 
tion. At the end of the computation each of the unknowns is found in the form 

(20) x,{l =fc €,), 

where Xj represents the value found and ey is the upper bound of the relative 
computational error in x t . 

A comparison of with the upper bound of the observational error | toy | of 
equation (3) will then indicate whether the computation is adequate. If the 
comparison shows that the computation was inadequate, it will show in which 
steps the number of significant figures / t was too small, and by how much. 
The computer can recompute, carrying these steps to the requisite number of 
figures with the assurance that his recomputation will then be adequate. The 
comparison will further indicate in which steps if any the number of significant 
figures fi was larger than necessary. 

When a computer has thus set suitable upper bounds to the relative computa¬ 
tional error in the solution of a set of linear equations, he is in a position to plan 
solutions of future similar sets so as to perform his computations more eco¬ 
nomically and yet safely. This is especially true when the solution of simulta¬ 
neous linear equations arises week after week in routine testing. 

14. Conclusions. Summary rules have been published, purporting to be safe 
guides to computers in avoiding needless work, and ensuring that the computa¬ 
tions are carried to a sufficient degree of accuracy. Many of them are useful 
guides for certain types of computation and for limited ranges of the numerical 
values entering into the computation, but none of those that I have seen can be 
used generally. The only safe rule, where the matter is of importance, is to 
calculate the maximum possible computational error that can enter in the par¬ 
ticular sequence of computation followed, and make sure that it is kept within 
the necessary limits. 

It is sometimes necessary to carry the intermediate steps of a computation to 
many significant figures beyond the significant figures given in the data, or kept 
in the result. The relative error of one of the unknowns may be very much 
smaller than the relative errors of the data from which it is computed, while the 
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relative error of another of the unknowns may be larger. The methods of 
ensuring that the computations are adequate are outlined in section 13. 

For the best sequence to follow in the elimination of the unknowns, I shall 
pass along a suggestion of Dr. W. Edwards Deming which he gave in one of our 
discussions of this subject. I venture to pass it along, because it has worked in 
eveiy special case that I have constructed in an attempt to prove that it does 
not hold generally. If ever the suggestion fails, the computer may change the 
sequence; but in any case he is obliged, as stated above, to calculate the maximum 
possible computational error that can enter into his calculations. Dr. Deming’s 
suggestion is this: “To evaluate some but not all of the unknowns to the highest 
possible computational accuracy, retaining as few significant figures as possible 
in the intermediate steps, solve the equations by successive elimination, elimi¬ 
nating first and evaluating last the unknowns of greatest inherent relative 
accuracy.” 

15. Summary. Expressions are given for the maximum observational error 
in the unknowns of a system of simultaneous linear equations, in terms of the 
relative errors of the coefficients and absolute terms therein. In order to extract 
all the information possible from a system of linear equations representing ob¬ 
servational results, it is not sufficient in general to assume that the relative errors 
in the unknowns are as large as the relative error in the determinant of the 
system. In many problems the computation of some of the unknowns must 
therefore be carried to more significant figures than are determinate in the 
determinant of the system. Methods are outlined for evaluating computational 
error in the solution of linear equations to ensure that the computations are 
adequate. 

In conclusion I wish to express my thanks to Dr. W. Edwards Deming who 
has given much of his time to assist me in the preparation of this paper. He has 
made valuable suggestions on the material to be included and the general manner 
of presentation. In addition he has criticized the manuscript in detail and 
assisted in the final revision. 
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ON MECHANICAL TABULATION OF POLYNOMIALS 
By J. C. McPherson 
International Business Machines Corporation 

1. Introduction. The purpose of this paper is to show how automatic 
accounting machines, which have been used previously in evaluating such 
quantities as Xx n and 2x n ~ l y, may be used in the preparation of mathematical 
tables of integral powers, of polynomials, and of functions which can be approxi¬ 
mated by polynomials. These tables may be prepared for any desired intervals 
of the argument such as 1, xfoj, £, etc. 

The method is an adaptation of the general theory of “cumulative” or “pro¬ 
gressive” totals which has proved useful in computing moments and product 
moments both with and without accounting machines. The reader unfamiliar 
with the mathematical method and its machine applications might refer to such 
presentations as those of Hardy [1], Mendenhall and Warren [2, 3], Razram and 
Wagner [4], Brandt [5], and Dwyer [6, 7]. The main feature of the method is 
the computation of summed products or of summed powers by means of succes¬ 
sive cumulated additions. It is shown in this paper how it is possible to use 
this same process in constructing tables of powers and tables of polynomials. 


2. The Cumulative Formulas. If the numbers F x are defined and finite 
for x = 1, 2, 3, • • • , (a — 1), a, and if these values of F x are cumulated for x — 
a, x — a — 1, etc., then the value in the row headed by x = 1 can be written 
as l Ti . If these cumulations are cumulated successively with the superscript 
indicating the order of the cumulation and the subscript indicating the value of x 
which heads the row, then 

¥ 

2rp __ -v 8 rp ^ (X * 4 “ ]■)% jp fm V X(X 1 ) p 

r,-s —2T~ F - t >- z —W- F - 

*Ti •" 2 — — ^ F 


and in general for i < j, 
( 1 ) *Ti 


Jx+j- (* + l)] (H) p 

z —(Fnji— r - 


Formula (1) is basic to much of the previous work involving cumulative totals. 
Various authors have studied such important special cases as (A) where F, 
equals the frequency function /,, (B) where F, = xf„ , and (C) where F m equals 
the sum of all the values of y having the same x value. These special cases have 
been found very useful in computing moments and product moments. 
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The moments may be expressed in terms of the cumulations in a variety of 
ways. The diagonal formulas have the differences of zero as coefficients and are 
expressed in terms of l Ti , 2 7\ , , 4 T 8 , *Ti , etc. The columnar formulas, 

whose coefficients have been recently studied [6, 7], are expressed in terms of 
cumulations of the same order, 3 T X , with j fixed. Razram and Wagner [4] 
have given formulas which utilize the entries of different rows and different 
columns but which demand fewer entries for the formulas. Razram and Wag¬ 
ner worked out the formulas through 2 x A f m but the argument holds for 2x 4 F* . 
For purposes of comparison the values of Zz x F x , i = 0,1, 2, 3, 4, as they appear 
in the diagonal, columnar, and Razram-Wagner systems are presented in 
Table I. 


TABLE I 

Values of 2x'F x for i — 0, 1, 2, 3, 4. 


Fx 

Diagonal 

Columnar 

Raaram-Wagnor 

ZF« 




ZxF , 




2 x*F x 




2 **F» 




Zx*F, 





In developing the theory of the later sections of this paper I have developed 
further formulas of the type shown by Razram and Wagner since these formulas 
have fewer terms than do those of the other systems and the coefficients are 
factorable by (j — 1) !/2. These formulas for 2x*F x , with s even, feature such terms 
as *T X + = *7 t i+ 2 , , etc., so that there are two entries from the same 

column. For the purposes of this paper it is preferable to have a single entry 
from each column and this situation results from continued application of the 
formula 

(2) S T i+li+ n = *Ti + ’Ti+i = + 2 T <+1 . 

The formulas for lx‘F x with s| 12 are given. The alternative forms are given 
for the formulas involving even values of s. 

1F X = l Ti, lxF x = 2 Ti, 2x 2 F x = *Ti + *T t = t Ti+t = i Ti + 2 t T i , 
lx *F X = 'Ti + 6 *T t , lx *F X = T 1+l + 12 i T i+i 

= s Ti + 2 *T t + 12 V, + 24 *T t , 

lx *F X = 2 Ti + 30 *Tt + 120 ‘I 7 ,, 
lx *F X = ‘T 1+ , + 60 5 r s+s + 360 7 7V. 

- *Ti + 2 *7* + 60 *Tt + 120 'T t + 360 ‘T, + 720 7 T t , 
lx 7 F x - * Ti + 126 *T, + 1680 *T» + 6040 *T t , 
lx i F, = *Ti+» + 252 ‘Th* + 6040 7 T^ + 20160 9 T 4+5 

= *Ti + 2 *T» + 252 *T, + 504 ‘T, + 5040 *T, + 10080 7 T 4 
+ 20160 8 T 4 + 40320 *T», 
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(3) 2* *F. - *T X + 510 Y, + 17640 *r, + 151200 *T t + 362880 “TV, 

Si "F. ■= *7\+i + 1020 ‘TV, + 52920 7 7Vi + 604800 *7V» + 1814400 u 7*+« 
- *ri + 2 ‘r, + 1020 4 r, + 2040 ‘r, + 52920 *r, + io5840 7 r 4 

+ 604800 *Ti + 1209600 *T h + 1814400 “7* + 3628800 U 7V 
Si u F. = Y, + 2046 *T» + 168960 *T t + 3160080 *T, + 19958400 “r, 

+ 39916800 “T 7 ,, 

Si “F, = ‘Ti +i + 4092 ‘r w + 506880 7 7V* + 12640320 'TV* 

+ 99792000 U 7V* + 239500800 u 2Vr 
= 2 r, + 2 ’T, + 4092 *T, + 8184 ‘T 7 * + 506880 *T t + 1013760 7 r« 
+ 12640320 + 25280640 'T 7 , + 99792000 “T 7 * 

+ 199584000 u T t + 239500800 “TV + 479001600 “7V. 

The derivation of these formulas is obtained with the use of (1), with the use of 

(4) *Ti = % +l + *~ l T ( , 

and with the use of formulas of lower order. For example we have from (1) 

(i, + 4)(x + 3)(x + 2)(x + l)x r _ trr 
2 --- Fm _ Tl 

so that 

Sx i F x = 120 i T 1 - 10 Sx *F X - 35 Sx l F x - 50 Sx t F x - 24 2xF, 

which after substitution of Sx 4 F* , 2x 8 F* , etc. and simplification results in the 
value 2 Tx + 30 *T t + 120 *7*. 

3. Tables of powers. If F x — 1 when x — a, but is zero otherwise 
then 2x 'F* is equal to a*. It follows that the value of a’ can be obtained from 
the successive cumulations of this F, with the use of (3). For example in 
Table II 

TABLE II 

Cumulation* of F x — 1, when x » 6, 

0, when x t* 6. 
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6* - *r, + 2 l Tt = 6 + 2(15) - 36, 

6' - *T 1 + 0 4 T« - 6 + 6(35) - 216, 

6 4 - *ri + 2 ‘r, +12 4 r, + 24 4 r, - 0 + 2(15) +12(35) + 24(35) -1296. 

The valueB of *Ti , *7*, 4 7j and l T» for a * 6 are italicised in Table II. 

To get the values of 5 s , 5*, 5 4 , etc. it would be necessary to start to cumulate 
from x = 5. Now since the values of l Ti are unity, it follows that the values 
for a = 5 can be found by taking the entries above those for a = 6. Thus 
*Ti = 5, *T 2 = 10, 4 7i - 20, ‘T, = 15 with 5 s = 5 + 2(10), 5* = 5 + 6(20), 
5 4 = 5 + 2(10) + 12(20) + 24(15). It is evident in general that the values for 
any a 2 , a*, a 4 can be obtained by taking the row headed by a as the bottom row. 
Thus using o = 8, we have 8 2 = 8 + 2(28), 8 s = 8 + 6(84), etc. It then appears 
that we may omit the x column of Table II and consider the cumulations to be 
ascending cumulations for a rather than descending cumulations for x. 

A more satisfactory course is to cumulate the coefficients so as to eliminate 
the multiplications. Thus the value of 6 J 7\ could be obtained .without multi¬ 
plication by cumulating 6, 0, 0, 0, 0 • • • rather than 1, 0, 0, 0, • • • . Several 
cumulations may be carried on at the same time so that the additions are not 
necessary and the tabulation results in a table of the desired powers. 

In preparation of a power table, the formulas (3) become a series of instruc¬ 
tions on the way in which we are to do the cumulating. For instance the 
formula: 

x 1 = 5040 8 r« + 1680 t T s + 126 % + 2 T,, 

tells us that to form a table of the seventh power we must cumulate 1 the coeffi- 
cient 5040 eight times; add in the coefficient 1680 when there are six operations; 
the coefficient 126 when there are four; and the coefficient 1 when there are two 
remaining. A change in subscript tells us that the coefficient when first included 
forms a separate total ahead of the ones already partly figured. When the sub¬ 
script does not change, the coefficient is to be included in the first summary card 
total. The final cumulating operation prints the actual table. 

To prepare a power table by machine we secure a set of cards punched all alike 
with the numbers from 1 to 9 punched diagonally in successive columns across 
the card. The machine is wired to add the coefficient of the highest term by 
selecting the proper digits from the diagonals, cumulate after each card and sum¬ 
mary punch each total. This way of starting saves one cumulation. The 
summary cards are cumulated repeatedly in the same manner until the number 
of operations indicated by the highest term is completed. When the number of 
operations remaining equals j of another term , a card for the coefficient of 
that term is included in the tabulation ahead of the summary cards. This 
automatically adds the new coefficient to each term of the series. When the 
subscript i in changes,,the new coefficient card must form a separate total; 

1 This operation is generally known as progressive totalling in machine operation. 
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when it does not change, the coefficient card must tabulate in the first summary 
card total. 

To illustrate the tabulation of power tables, the formula for the cube table is— 
* , -6 4 r, + *7’ 1 . 

The successive operations yield the following table: 


TABLE III 


at 

1 

Operation number 

2 3 

4*» 

1 

0 

0 

1 

1 

2 

6 

6 

7 

8 

3 

6 

12 

19 

27 

4 

6 

18 

37 

64 

5 

6 

24 

61 

125 

6 

6 

30 

91 

216 

7 

6 

36 

127 

343 

8 

6 

42 

169 

512 

9 

6 

48 

217 

729 

10 

6 

54 

271 

1000 


In actual machine work, operation 1 can be omitted and work begun with opera¬ 
tion 2. The machine is set to add the coefficient 6 of the highest term from 
each card and an accumulated total is printed and punched for each card tabu¬ 
lated, giving the results shown under operation 2. An additional card is punched 
for the coefficient of the second term, 1, and placed ahead of the cards produced 
in operation 2. The cumulation and punching is repeated, giving the results 
shown under operation 3. The summary cards from this operation are cumu¬ 
latively tabulated, giving the results shown under operation 4, which is the 
table of cubes desired. 

Similarly, for a table of the fourth power, the formula = 24 *Ts + 12 4 T 2 + 
2 8 'A + 2 T\ indicates the following operations— 


TABLE IV 


X 

1 

2 

Operation number 

3 4 

5:* 4 

1 

0 

0 

0 

1 

1 

2 

0 

12 

14 

15 

16 

3 

24 

36 

50 

65 

81 

4 

24 

60 

110 

175 

256 

5 

24 

84 

194 

369 

625 

6 

24 

108 

302 

671 

1296 

7 

24 

132 

434 

1105 

2401 

8 

24 

156 

590 

1695 

4096 

9 

24 

180 

770 

2465 

6561 

10 

24 

204 

974 

3439 

10000 

11 

24 

228 

1202 

4641 

14641 

12 

24 

262 

1454 

6095 

20736 
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Note in operation 3 where the subscript does not change, the coefficient 2 is 
added to the first card punched by the machine, while in operation 4 where it 
changes, the coefficient 1 appears as a separate total. 

4. Tables of polynomials. To tabulate values of /(x) = a + bx + ex* • • • 
(where a, b, c, ■ ■ • , are positive or negative coefficients) the method is similar 
to that of preparing power tables except that the coefficients to be added are 
determined by multiplying the coefficients of the formulas for the different powers 
by the values a, b, c etc., adding the coefficients of like terms in the various 
formulas, and using these resultant coefficients in place of the simple coefficients 
used in the power tables. Thus if we wish to tabulate values of /(x) = 4 + 3x + 
2x 2 + x b the coefficients are found as follows: 

4x° - 4 ‘To 
+ 3x = +3 2 Ti 

+ 2x 2 = +2 s r 1 + 2.2 , r, 

+ x s = + 2 Ti + 30 4 T 2 + 120 *T, 

/(*) = 4 ‘To + 6 2 T, + 4 3 T S + 30 4 T 2 + 120 # T* 

This equation gives instructions to perform six operations with 120 as coeffi¬ 
cient; adding the coefficient 30 as a separate total when there are 4 operations 
remaining; adding 4 to the first summary card total when there are 3 operations; 
adding 6 as a separate total when there are 2 operations remaining; and adding 4 
on the last operation. 

The first few totals appear thus— 


TABLE V 


9 

l 

2 

Operation number 

3 4 

5 

0:/(*) 

0 






4 

1 

0 

0 

0 

0 

6 

10 

2 

0 

0 

30 

34 

40 

50 

3 

120 

120 

160 

184 

224 

274 

4 

120 

240 

390 

674 

798 

1072 

5 

120 

360 

760 

1324 

2122 

3194 

6 

120 

480 

1230 

2664 

4676 

7870 

7 

120 

600 

1830 

4384 

9060 

16930 

8 

120 

720 

2660 

0934 

16994 

32924 

9 

120 

840 

3390 

10SH 

26318 

59242 

10 

i m 

960 

4360 

14674 

40992 

100234 


It is not necessary to confine these tables to values for whole numbers, as we 
can tabulate equally well values of f(x) for intervals of x of .1, .01 or .001 or J, 
i, 1 etc. In this case, before combining formulas for different powers we multi- 
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ply both sides by the desired interval raised to the power to which x is raised in 
that particular formula, then add like terms as before. 

To tabulate the previous example in . lx intervals we proceed as follows: 

4x° = 4.000 l T 0 

3x/10 - + .3 i Ti 

2(x/10)* = + .02 2 Ti + .04 *T» 

(z/10)‘ = + .00001 2 Ti + .00030 *T, + .00120 % 

f(x) = 4 l T 0 + .32001 ‘ i T l + .04 + .00030 *T t + .00120 


TABLE VI 


X 

1 

2 

Operation number 

3 4 

5 


1 

0 

0 

0 

0 

.32001 

4.32001 

2 

0 

0 

.0003 

.0403 

.36031 

4.68032 

3 

.0012 

.0012 

.0016 

.0418 

.40211 

5.08243 

4 

.0012 

.0024 

.0039 

.0457 

.44781 

5.53024 

5 




.0532 

.50101 

6.03125 

6 

■39 



.0655 

.56651 

6.59776 

7 

.0012 

.0060 

.0183 

.0738 

.64031 

7.23807 

8 

.0012 

.0072 

.0255 

.0993 

.73961 

8.07768 

9 

.0012 

.0084 

.0339 

.1332 

.87281 

8.95049 

10 

.0012 

.0096 

.0435 

.1767 

1.04951 

10.00000 


Where any coefficients are negative in the equations expressed in J T t terms, 
they are simply added in as minus figures. 

To round off the preceding function to 3 decimal places, we add 5 to the con¬ 
stant term 1 T 0 in the position to the right of the last decimal retained, i.e. in 
this case the 4th decimal place. The constant term is then 4.0005. 


Exact 

Counter reads 

Prints 

4.32001 

4.32051 

4.320 

4.68032 

4.68082 

4.680 

5.08243 

5.08293 

5.082 

5.53024 

5.53074 

5.530 

6.03125 

6.03175 

6.031 

6.59776 

6.59826 

6.598 

7.23807 

7.23857 

7.238 

8.07768 

8.07818 

8.078 

8.95049 

8.95099 

8.950 

10.00000 

10.00050 

10.000 


5. Automatic calculation of polynomial coefficients. Frequently when 
polynomials are being evaluated, the process of forming the coefficients can be 
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performed automatically from a punched-card table. Such a table consists of a 
set of cards for each power x* containing the multiples of all the coefficients of 
each of the terms ; 7\ in the formula (3) for that power. These multiples are 
1, 2 f 3, 4 , . • • , 9; 10, 20, 30, 40 , •.. , 90; 100, 200 , ... , 900; 1000, 2000 etc., 
and may be produced automatically by making a linear table of each coefficient 
in the manner described in this paper. Each card is punched with the informa¬ 
tion called for by the heading of the following card form: 


/ 

/ S 

j 

i 

multiple 

07 

06 

03 

00005 


coeff. X 
multiple 
008400 


The particular figures indicated are those which would be punched for the 
term 5(1680) 6 Ta in the representation of 5x 7 according to formula (3). 

The table is used by withdrawing the cards for the coefficients a, 6, c ) d ) etc. 
of the desired polynomial. For instance, if one of the polynomial coefficients is 
14485 x 7 , we select from the x section of the table all cards containing the multi¬ 
ples 10000, 4000, 400, 80, and 5. In the x 7 table there are 4 cards for each multi¬ 
ple, one each for terms *T 4 , 6 T 8 , A T 2 , and 2 7\ . These cards are combined with 
the cards selected for the other coefficients of the polynomial and sorted to bring 
all cards for each } T { together. The cards for each term 3 T< are then automati¬ 
cally added on the electric accounting machine. 


6. Subdividing tables. In preparing tables it may be desired to prepare 
the table in more detail at certain points, giving values of the function at 1 /10, 
1/20, 1/50, or 1/100, etc., of the interval of the rest of the table. This may 
readily be done by recalculating the coefficients of the cumulative terms, and 
using these values in the same manner as the original ones. 

There are many formulas for the determination of the subdivided differences 
given in various texts on interpolation, such as those given by Comrie [8] and 
Bower [9]. One effective method is to use formulas (3) to calculate the sub¬ 
divided differences. The values called for in the formula for the highest power 
are taken from the table of the function at the regular interval, giving effect to 
the rule involving subscripts. These coefficients are reduced by an amount 
sufficient to cancel the coefficient of the highest cumulative term, and the coeffi¬ 
cients of the remaining cumulative terms are reduced in proportion according 
to formula (3) for the highest power. Usually the coefficient of the highest term 
of the formula will divide evenly into the coefficient taken from the table, and 
the other reductions are calculated by multiplying this result by the other 
coefficients of the formula. The highest remaining coefficient is then reduced 
by an amount sufficient to cancel itself, and, by use of the formula (3) for the 
power whose highest cumulative term matches the highest remaining coefficient, 
the reduction to the remaining cumulative terms is calculated and subtracted. 
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The highest r emainin g coefficient is reduced in a like manner, and this process is 
continued until all the cumulative coefficients have been analyzed. 

The partial cumulative coefficients thus computed are multiplied by the de¬ 
sired subdivision 1/m raised to the power of the corresponding formula (8), 
and recombined to form the new coefficients, as shown in the example below. 
In taking values from the table, when the subscript does not change, the tabular 
value must be reduced by the amount of the higher coefficient with the same 
subscript, to give effect to the rule that the coefficients in such cases are incre¬ 
ments (see last example in section 3). 

To subdivide the polynomial of section 4 at x = 7.0, we take the italicized 
values from Table V starting at/(7) as ‘To, and proceed as follows: 


From Table V. 

Y, 

120 

Y, 

960 

-120 

Y, 

3390 

Y, 

10324 

-3390 

Yi 

15994 

Y, 

16930 

F(x) . 

120 

840 

3390 

6934 

15994 

16930 

ax 6 . 

120 


30 


1 




840 

3360 

6934 

15993 


bx* . 


840 

420 

70 

35 





2940 

6864 

15958 


cx l . 



2940 


490 






6864 

15468 


dx 2 . 




6864 

3432 







12036 


ex . 





12036 








16930 

the interval is 1/10 we have: 






Y» 

Y, 

Y, 

*T, 

Y, 

Yo 

z‘/10‘ = .00120 


.00030 


.00001 


35x‘/10 4 = 

.0840 

.04200 

.0070 

.00350 


490*710’ - 


2.94000 


.49000 


3432a//10 2 = 



68.6400 

34.32000 


12036x/10 = 




1203.60000 

+ 16930 


/(*) = .00120 Y, + .0840 Y, + 2.0823 Y, + 68.6470 Y* + 1238.41351 Y t + 

16930'To provides the coefficients for subtabulating the function at the desired 
interval, beginning at the argument x — 7.0. 

7. Accuracy of Tables. When tb" values of the coefficients are not 
exact, owing to the original values for a, b, c etc. or the dropping of decimals in 
the computation of the coefficients, the errors accumulate fairly rapidly. Each 
coefficient will introduce its own error into the summation. 
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To maintain accuracy throughout a long table it is advisable to transform f(x) 
by Homer’s method of decreasing the roots [10, pp. 100-101], compute new 
coefficients for the transformed equation at intervals, and prepare the table in 
sections* Decreasing the roots by r gives us a new starting point at x = r. 

Since two or more functions may be computed at one time, a function for 
which the coefficients are not exact may be computed by adding in the usual 
way from the starting values and subtracting from the ending values simul¬ 
taneously. As many digits as agree in both tabulations of the function may be 
considered correct. 

The tabulations can be made to practically any degree of accuracy on the 
equipment available, as the newer machines can be formed into counters of any 
capacity up to 80 digits. In practice, counters of 16, 20 or 24 digits will ordi¬ 
narily suffice for the accuracy desired and two or more functions can be evaluated 
simultaneously. Cards are read and added at the rate of 150 per minute, or read, 
added and listed on the tape at the rate of 80 per minute and new summary 
cards produced at the rate of 40 per minute (on alphabetic equipment with gang 
summary punches). Computation may be carried out with additional decimal 
places and the final tabulation of the function rounded off to the nearest number 
retained. 

8. Summary. The cumulative or progressive-total method is shown to be 
applicable to the preparation of tables of functions expressed in the form of 
a power series. 

The cumulative formulas for the powers through the twelfth power have been 
presented, and simple methods are given for transforming a power series into its 
corresponding cumulative formula, for changing the interval of the table, 
rounding off the values of the function, and subdividing the table at desired 
points. 

It is hoped that this discussion will make tables in printed or punched-card 
form more generally available as a tool for the computer. Since tables may be so 
readily prepared by this process, the usefulness of the tabular method of solving 
problems is greatly increased. 

The author wishes to acknowledge his thanks to Professor P. S. Dwyer for 
various suggestions, particularly in connection with section 2. 
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ON THE PROBABILITY OF THE OCCURRENCE OF AT LEAST m 
EVENTS AMONG n ARBITRARY EVENTS 


By Kai Lai Chung 

9 Tsing Hua University , Kunming , China 

Introduction. Let Ei , • • • , E n , denote n arbitrary events. Let 
- 1 —, where 0 ^ t g j S n and (vi, • • •, v/) is a combination of the 

integers ( 1 , • • • , n), denote the probability of the non-occurrence of £„,,••• , 2 ?, # . 
and the occurrence of i? ri+I , * • • , E Pj . Let denote the probability of 

the occurrence of E, t , * , E ¥i and no others among the n events. Let S, ** 

Sp ri ... ry where the summation extends to all combinations of j of the n integers 
( 1 , •• • , n). Let p*^, • • • , v*), (1 ^ ra g k £ n), denote the probability of 
the occurrence of at least m events among the k events E ¥1 , • • • , E n . 

By the set (x x , • •. , x b , ... , x a ) — (xx , • • • , xt) (where b ^ a) we mean the 

set (xb+i, • • • , £«). And by a ^^-combination out of (xi , - • • , x«) we mean 

a combination of 6 integers out of the a integers (xi, • • • , x a ). 

We often use summation signs with their meaning understood, thus for a fixed 
t, 1 ^ i | n, the summations in Sp,,...,*, or i, • • • , v*), extend to all 

the ^^-combinations out of ( 1 , • • • , n). 

The following conventions concerning the binomial coefficients are made: 

(o) “ (fy 3=5 0 if a < b or if 6 < 0 . 

It is a fundamental theorem in the theory of probability that, if E \, • • • , E n 
are incompatible (or “mutually exclusive”), then 

Pl(If • •• , n) = pi + • • • + Pn . 

When the events are arbitrary, we have Boole's inequality 

Pi(l, ••• , n) ^ pi + ••• + p n . 


Gumbel 1 has generalized this inequality to the following: 



1 C. R . Acad. 5c. Vol. 205(1937), p. 774. 
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for & = 1, • • • , n. The case k — 1 gives Boole’s inequality. Fr6chet* has 
announced that Gumbel’s result can be sharpened to the following 


( 1 ) 


Ah+i — 


2pi(n , • • • » Vk+i) 





for k = 1, • • • , » — 1. Thus, At is non-increasing for k increasing. On the 
other hand, Ppincarg has obtained the following formula which expresses 
Pi(l, ...,») m terms of the S/a, 

pi(l, •••,») = !?,,- Epmm + Ep>»»»». - ••• 

( 2 ) * 

+ (-l)"pi...n = E (-l)^flr. 
1-1 


In the present paper we shall study the more general function p m (vi ,•■•,•»*) 
as defined above. First we generalize Poincare’s formula and Frfichet’s inequali¬ 
ties. In Theorem 1 we establish (for 1 S » S n) 


(3) 


Pm(l, 


E Pm.“ (7) 2- . . +i 

+ ( w +^ Epm + ••• + (-i) n-m (” _})?»•••« 

-£(-iK- + n«~ 


Although this result is well known, we prove it in preparation for Theorem 2. 
Theorem 3 establishes 


(4) 


4 On) _ S Pm(»>, , • • • , r*+l) ^ 2p m (n , • ••,**) ' 

-Afc+I / \ — / \ 

/ w — m \ In —'m\ 

\k + 1 — m) \k — m/ 




for fc = 1, • • • , ra — 1 and 1 g m g A;. 

Next, we extend the inequalities (4), and in Theorem 4 we show that 

(5) Ar g i(Afc"i + Aft}); 


which states that the differences A* — A*+i (k *= 1, •••,»— 1) are non-decreas¬ 
ing for increasing k. From this and a simple result we can deduce (4). Also 
Theorem 2 establishes that 


( 6 ) 




4-0 \ * 



* Loc. cit., Vol. 208(1939), p. 1703. 
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for 21 + 1 « — m and 2 1 £ n — m respectively. These inequalities throw 

light on formula (3) and are sharper than the following analogue of Boole’s 
inequality for p m (l, • • • , n), which is a special case of (4): 

(7) p„(l, ... ,n) S Sp,,...r,. 

The last statement will be evident in the proof. 

In Theorem 5 we give an “inversion" of the formula (3), i.e. we express pi...» 
in terms of the p*( »i , • • • , as follows: * 

(” _ j^Pl—» = 2 PmiVl, • • * , Vm) — 53 Pm(»l, • • • , •'m+l) + • • • 

(8) + (-ir™p m (l, ...,n) 

n —m 

= S ( — 1)^ 23 * * * i v m+*) • 

$-0 

This of course implies the following more general formula for p ai ... 0r , 

(m - l) Pa ‘”' a ' = ^ P^ 1 > " ' »*'»+•) 

where (ai, • •• , a r ) is a combination of the integers (1, • • • , n) and where the 

second summation extends to all the ( r . . J-combinations of (ai, • • • , <x r ). 

\m + t/ 

Since it is known 3 that we can express other functions such as S r , P[ Mr . . Mr j in 
terms of the s, we can also express them in terms of the p m (Vi, • • • , v k ys, 

provided r ^ m. 

Finally, for the case m = 1, we give in Theorem 6 an explicit formula for 
Pfi... r ] in terms of the pi(vi , • • • , v*)’8, as shown in (9), 

p[i---r] = — Vi(r + 1, • • •, n) + ]C Viivi) r + 1, • • • , n.) 

pi 

^ - 2 Pl(n, *!»,>■+ 1, •••»*») + ••• 

+ (—l) r-1 Spi(l» ,r,r + 1, • • • ,n), 

38 £ (~l)* -1 2 Pi^i, • • •, vi, r + '1, • • •, n), 

t-1 (pir * **p») 

where (v \, • • • , v») runs through all the ^^-combinations from (1, * • • , r). 
This of course implies the following more general formula: 

r 

P[«»”«rl “ £ 2 Pl(n, • • • , •'i, ar+ 1 , • • • , a»), 

(Pir •*.*»<) 

‘^FrSchet, “Condition d’existence de syatemes d^v^nements aasoctes k certaines 
probability,” Jour, de Math. t (1940), p. 51-62. 
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where (an, ■ • • , a r , •••<*„) is a permutation of (1)' • • • ,n) and where 

(m , • • • , Vi) runs through all the ^^-combinations out of (<*j , • • • , a r ). Fijom 

Theorem 6 and two lemmas we deduce a condition of existence of systems of 
events associated with the probabilities pi(m , • • • , v„). The author has* not 
been able to obtain similar elegant results for the general m. Probably they 
do not exist. 


2. Generalization of Poincare’s formula; Generalization and sharpening of 
Boole’s inequality. 

Theorem 1: 

P»(l> •••»*) = 23 V’l ■■•I'm ( 1 ) P'* 

+ (*" 2 ! ) ^ Vn . . + (-1)B_m (« - m) p1 ' '*• 

Proof: We have 


( 10 ) 


p».(i, ••• ,n) = 23 23 pim,- 
6—0 


•Mm + hl ! 


where the second summation extends, for a fixed b, to all the 
tions of (1, • • • , n). Further we have 



combiner 


n —n»— c 

(H) V»l'"v m +c ^ 23 X) jPpl“ ••'m+fl* * 'Vm+e+d) 

d—0 

where the second summation extends, for a fixed d, to all the 

combinations of (1, •••,») — (11 ,••• f Vm + e ). The formulas (10) and (11) are 
evident by observing that the probabilities in the summations are all additive. 
Now we count the number of times a fixed appears in (3), By (11) 

this is equal to the sum 



(m + b\ (m\(m + b\ . (m + lV m + b\ 

\ m ) VlAm + l/ + V 2 )\m + 2) * 




since this number is the coefficient of (—l) m x m in the expansion of 
(1 - x) m+t (l - iy = (-l)-"x n (1 - x)\ 


Thus by (10) we have (3). 
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Theohkm 2: For 21 n — in and 21 £ n — m respectively, we have 


( 6 ) 


Z (-»■(”■ + ‘■ s p.(h •..,«) sZ<-i) , (* + ? -1 )«. 

im* \ l / »-0 \ * / 


Proof: By the reasoning in the previous proof, it is sufficient (in fact also 
necessary) to show that 

*1 i a _l. iA «+» 




Since 


/m - 1 + A/m + 6\ = (m + b)\ /b\ 1 

\ z )\m + v (m — 1)! b\ \i) m + i 

is an integer, it is sufficient to show that 

(12) £(-i)<( b ) 1 >0, E(-irg) 1 go. 

»-o \i / m + z t—o V / t 


b — i 

Suppose 6 > 0 is even. For i g 6/2 — 1, we ha ve — 

i + 1 


> 1 so that T 


b — i 


i+ 1 


i + 2 
i + 1 


Also 


m + i 


m -f- z -j- 1 z Hb 2 


^ 1±4 form ^ 1. Hence 


/ b \ 1 = b — i m + i /b\ 1 

\i + 1/ m + i + 1 i+lm + i+ l \i/ m + i 

^ { + 2 1 + 1 1 = ( b \ 1 

~ z + 1 i + 2 \z/ m + i \i) m + z* 

For z ^ 6/2 we have ^ 1 < 1 so that \ ~\ - ^ 4 "-^—- < 1 and 

z + 1 z+lm + z+1 

(. b \__J _ 

\z + 1/ m + i + 1 \ij m + i 
Thus the absolute values of the terras of the alternating series 


M/b 


6 ! 


V ( —___ 

iZ o v Vz7 w + z (m + 6)!(m — 1)! 

are monotone increasing as long as z % | — 1, reaching maximum at z = ^ and 

2 2 

then become monotone decreasing. 

Therefore (12) evidently holds for 21 % b/2 and 21 + 1 g 6/2 respectively. 
For t S | + 1 we write 

V (_i)‘( b X_L_ « _ - _ j* (_i )*( b \ 1 

ra V '\t'/w + i (to + 6)!(to — 1)1 ,C?+i ; \V wl + * 
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<-<+i 

6 - 1-1 


__ 1 

(to + 6)1 (to — 1)1 H W»n + fc-i' 
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From the above and the fact that 


6 ! 


1 


we see that the 


(m + 6)!(m — 1)! m + b 
righthand side is an alternating series whose terms are non-decreasing in absolute 
values. Hence (12) is true. 

If b is odd, the case is similar. 


3. Generalization of Frfichet’s inequalities and related inequalities. Before 
proving our remaining theorems, we shall give a more detailed account of 
the general method which will be used. In the foregoing work we have al¬ 
ready given two different expressions for the function p m ( 1, • •. , n), namely, 
formulas (3) and (10), but they are not convenient for our later purposes. 
Formula (3) is inconvenient because it is not additive and because the p»,...,/s 
are related in magnitudes; while formula (10) has gone so far in the separation 
of the additive constituents that its application raises algebraical difficulties. 
Let us therefore take an intermediate course. 


Let each ^^-combination , - • • , v m ) out of (1, •••,») be written so that 

vi < V2 < • • • < p m . Then we arrange them in an ordered sequence in the 
following way: the combination (vi, • • • , v m ) is to precede the combination 
(mi , • • • , Mm) if, for the first Vi p* m , we have v> > m» . After such an arrange¬ 
ment we symbolically denote these combinations by 


I, II, 



Further, all the ^^-combinations out of (y \, • • • , Vk) where the latter is a com¬ 
bination out of (1, • • • , n) are arranged in the order in which they appear in 
the sequence just written. For example, all the ^ 2 ^" com ^^ na ^ ons ou t of 
(1, 2, 3; 4) are ordered thus: 

(12) (13) (14) (23) (24) (34). 


Let U denote a typical combination (mi , • • • , Mm). By E v we mean the com¬ 
bination of events E , • • • , En m so that p v = p M1 ... Mw . In general, let the 
combinations Ui , • •« , l V i, Ub be given, then pui^u^j v b denotes the proba¬ 
bility of the non-occurrence of U\, •• • , Ub~\ and the occurrence of Ub . 

Now let I, II, • • • , £(^) - lj = Y, = Z denote all the Q^-com- 

binations out of {yi , • • • , v h ) in their assigned order. We have 


(13) Pmiv i , ■ ■ • , Vk) * Pi + Pvii + Pi'ii'in + • * • + Pi* . 

This fundamental formula is evident. Of course it is possible to identify the 
p’s on the right-hand side with the ordinary but we shall refrain from 

so doing and be content with the following example: 

pl(l> 2, 3, 4) = Pn + Pi*'* + pH'8'4 + Pl'88 + Pl'23'4 + Pl'2'84 • 
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Theorem 3. For k = l, ■ ■ ■ , n — 1 and 1 g m 2» k we have 

‘ (fc-m) Sp - (vi 1 • ‘ *’ Vk+l) * (k l 1 -m) Spm(pi> • • • ’ Vh) • 

Proof. Substitute (13) and a similar formula for k + 1 into the two sides 
respectively. After this substitution we observe that the number of terms is 
the same on both sides, since 

( n —m\/ n \A + l\_Y n — m \/n\/fc\ 
k — m) \k + l/\ m / + 1 — m)\k) \m/' 

Also, the number of terms with a given 11 = (mi , • • • , Mm) unaccented is the 
same, since 

( n — m\( n — m \ / n — m \/n — m\ 

k —m) \k + 1 — m) ~ \k + 1 — m/\k — m/ 


Let the sum of all the terms with U unaccented in the two summations be 
denoted by <r k +i = <r*+i (mi , • • • , Mm) and <r k * <r* (mi , ■ • • , Mm) respectively. It 
is. sufficient to prove, that 


( n — m\ ( n — m \ 

k -m) ~ \k + 1 - m) ak ’ 


k _ m ) te 1 * 1118 each of the form ...„ m where 

0 ^ g Mm - wi and where (vi, • • • , ^ , mi , • • • , Mm) is a ^-combination 

otft Of (1, • •• , Mm). For fixed (mi , • • • , Mm) and a fixed l but varying X’s, <r k 
contains ^ n _ ^ terms of the form , with exactly l accented 

subscripts. Let the sum of all such terms be denoted by <r k l) . Evidently <r k l) 
*“» ('" 7 m ) terms. As a check we have 

(n ~ Mm\ /Mm ~~ , / n - Mm \ Aim ~ Wl\ , 

0 1 / <r ‘” 

, /n — Mm\ /Mm - ^ /n - m\ 

^ \fc - Mm/ \Mm - mj \k — m/' 

which is the total number of terms in <r* . 

We decompose these p's partially, as follows: 

' Mm “ Zu +«Ml * * *Mh»+6 > 

0 Mm+1.-.-.Mm+6 

where fa, • • • , vi+ 4 ,ym, • • • , #»*+») is a permutation of (1, ••• , m») and where 
the second summation extends, for a fixed b, to all the m ^-combina¬ 

tions out of (1, • • • , n m ) — (ri, • • • , pi , mi > • • • Hm)< 
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Now consider a given 

where 0 £ t & ti m — m and (pi • • • p<Xi • • • X^n • • • p») is a permutation of 

(1, • • • , p„). It appears times in <r*°. Hence it appears 

(n — p»\ (t\,( n — ix m \(t\ f n — — p» + t\ 

\fc-WVv \fc-m- 1/V/ + h V*-«“*/V/ \ *-*» / 

times in <r* ■ 

Therefore to prove (14) it is sufficient to prove that 

(n - m\/n — fim + A / n — to \ /« — m« + <\ 

\k — m)\k +1 — to/ — \fc + 1 — m) \ k — m )' 

By an easy reduction we have 

(n — p m + t — k + to) 2s n — k 


or 

— Pm + t + to S 0; 

since t £ n„ — m this is obvious. 

Theorem 4: For 2iiS»-l and 1 ^ m ^ k we have 

... Sp m (»'i, • • • , vk) ^ 1 , • • • , r*-i) , 1 2p m (vi , • • • , v*+i) 

W /» — m\ “2 ( n-m \ 2 / n-m \ ’ 

\fc — m/ \k — 1 — m/ \fc + 1 — to/ 

Proof: By the reasoning in the previous proof, it is sufficient to show that 

/ n — m \ / n — m \/n-p m + «\ 

\k — 1 — m/\k + 1 — to/\ k — m ) 

/n - to\ / n-m \ /n —'p* + t\ 

~ \k — mj\k + 1 — rnj\k — 1 — m) 

/n — m\( n — m \fn — Mm + t\ 
\k — m/\k — 1 — m/\k + 1 — to/’ 

for 0 t £ Mm — to. By an easy reduction this is equivalent to 

2(» — k)(n — Mm + i — k + m + 1) £ (n — k + 1 )(n — k) 

+ (n — Mm + t — k + m+ l)(n — Mm + t — k + to) 


or 

(n-Mm + t- k + m + 1 )(p» - t - to) g (n - k)(jx m - t - to). 
For i = Mm — to we have equality, otherwise we have 

— Pm + t + TO + 1 SO. 
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We can deduce Theorem 3 from Theorem 4 and the following result (a case 
of generalized Gumbel inequalities): 

(16) ^ I ' ‘‘» ^ 2p»(ri> »)• 

Pboof of (16): Substitute from (13). Consider the p’s with U unaccented. 
The number of such terms is the same on both sides. But on the left-hand side 
they are all the same Pi'n'-..(v-ivv , while those on the right-hand side, being of 
the form poi—vio where 0 S A ££ U — 1 and (Ui , • • • , U\) is a combination 
out of (1, • • • , U — 1), are greater than or equal to it. Hence the result. 


4. The p ai -.- a< ’s in terms of the p m (vi, • • •, k*)’s and the piai-a^'s in terms 
of the pi(n, • • •, v*)*s. 

Theorem 6: For 1 <. m % n we hove 

(” _ J^Pl-“ 32 Pm(r l, • • • , ?m) ~ Hpm(?l, • • • , Vm+l) + • • • 

(8) + (-l)“^p»(l, ..-,») 

== 2 ( —1)* 32 Pm(n, ", »»*+<). 

<-0 PI 

Proof: As in the proof of Theorem 3, consider o*(mi , • • • , Mm). Here 
m k £ n. Since a given 


(16) *Ppl •••Mm f 

( n — Mm + t\ 

k — m ) tunes m ** > ^ appears 

=T-*f-r +, )=: 


Jfc—m 


if n — + t ^ 1, 

if n — /x m + < = 0. 


times on the right hand side of (8). Hence for fixed (m , • ■ • , p m ), the only 
p’s of the form (16) which actually appears are those with t = p m — n. But 
Pm 5 », thus t = 0, p m = n, and (Xi, • • • A ,, pi, - • • , #t m ) is a permutation of 
(1, • • • , n). The term in question is therefore pi...„. Since the number of 

(^-combinations of (1, , n) with p m = n is (::!) , we have the theorem. 

Theorem 6: For 1 ^ r g n - 1, we toe 

Pti-.-r! * - Px(r + 1, •••,»)+ 32 Pi(yi,r + 1, • • • , n) 

PI 

— 2- f +1, • • •, w) +. • • + (—i) r-1 2px(i> • • • » n ) 

»I«»* 

= £ (-1) <_I 22 Pi(n, •••,*», r + l, •••,»), 


( 9 ) 
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where (vi , • • ■ , *<) runs through all the [\J-combinations out of (1, *, r). 
Proof: We rewrite (14) for the special case m = 1, 

(17) Pi(mi , • • • , Uk) = p„i + + ••••+■ Pul -ni-iSk > 

where mi < mi < • • • < w • Substitute into the right hand side of (9). After 
the substitution let the sum of all those p’s with y unaccented be denoted by 
<r M . The terms in <r„ are of the form p M ;...„;_ 1A where 1 ^ s ^ y and 
(yi, • ■ • , y,-i) is a combination out of (1, • • • , y — 1). 

First consider a fixed y g r. For a fixed p M ;.we count the number of 
times it appears in <r M , that is, on the right hand side of (9). This is evidently 
equal to 


Z (-1)' 




0, 

1, 


if r — ac ^ 1, 

if r — m = 0. 


Thus the only terms that actually appear are those with m = r; and each of such 
terms appears exactly once with the sign (—1)*. Hence their total 

contribution is 


(18) Pr — X) TMr + S P'W ~ * * * + ( — 1)' * Pi'• • - ( r -l)'r = Pi--r , 

fi FX.rj 

by an easy modification of Poincares formula. 

Next consider a fixed m r + 1. Every term with /x unaccented in <r M is of 
the form (with the usual convention for /jl = r + 1) p M j... M ; <h-i)'..vm-i)'m > where 
(au , • • • , m«) is a combination out of (1, • • • , r); and it appears exactly once 
with the sign ( — 1)*. Their total contribution is therefore 

— P(r+l)'“-(M-l)V + ]C P>i(r+ 1)'---(M-1)'M “ P^OH-l) ' • • * (m-D'm + * * * 

+ ( —lJ^Pi'.-fM+DV Pl‘*‘r(r+l) , *-*0*-l)V> 


by another application of Poincares formula. Summing up for m = 
r + 1, • • • , n, we obtain 


(19) — (Pl-..r(r+l) + Pl...r(r+l)'(r+S) + * • • + Pl.--r(r+l)'...(n-l)'»). 


Adding (18) and (19), we obtain as the sum of the right-hand side of (9) 


Pl-.-r — (Pl--.r(r+l) + Pl---r(r+l)'(r+2) + * * • + Pi - - -r(r+l)'.. .(n-1) 'n) 


by an easy modification of (17). 


= Pi • • -r(r-fl ) 9 (r+2)'•••»' = P[l---r] 


5. A condition for existence of systems of events associated with the proba¬ 
bilities pi(vi, • • •, v k ). 

Lemma 1: Let any 2* — 1 quantities q(a i ,*••,<**) be given, where k * 
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1, • • ■ , and for a fixed k, (an, • • • , a*) rune through all the (j^-wmbinations 
out of (1, • • • , n). Let the quantities Q(an ,•••,<**) he formed as follows-. 

0 ( 0 ) - 1 - «( 1 , •••,«), 

Q(«i, •**,«*)“ — 5(a*+i, •**,<*») + 2 G'O'i) “fc+i> 

►t 

- £ fffo, pj, at+i, • ••,«») + + (— l )* -1 g(l, •••,»), 


where (»i, • • • , v<) runs through all the ^.^-combinations out of (1, • • ■ , n) — 

(a *+1 ,•••,«„). TTien the sum of all these Q’s is equal to 1. 

Proof: Add all these Q’s and count the number of times a fixed q(m , • • • , 
appears in the sum. For 1 £ k £ n this number is equal to 

- i+ C)-@ + - +( - i) ‘"(‘)"°- 

Hence we have the lemma. 

Lemma 2: (Fr^chet) Given 2 n quantities Q [ai ... ar j where (an , ••• , a r ) runs 
through all combinations out of (1, . *. , n) including the empty one . The necessary 
and sufficient condition that there exist systems of events E \, • •. , E n for which 

P[a,...« r ] = Q[ai—a,] 

(where p [0 ) denotes the probability for the non-occurrence of Ei , • • • , E n ) is 
that each Q ^ 0 and that their sum is equal to 1. 

Proof: Since the probabilities are independent, i.e., unrelated in 

magnitudes except that their sum is equal to 1, the lemma is evident. 

Theorem 7: Given 2 n — 1 quantities q(a x , • • • , a*) as in Lemma 1, the neces¬ 
sary and sufficient condition that there exist systems of events Ei, . •. , E n for which 

Pi(<*i , • • • , oi k ) = q(ati ,•••,«*) 

is that for any combination (a r +i, • • • , a n ), 1 S t ^ n — 1, out of (1, • • • , n) we 
have 

- q(<*r+ 1 , •••,«») + £ > <*r+i ,•••,«»)- £ q(a n , a v% , ar+i, • * •, a n ) 

n 

+ ••• + (-ir 1 ff(i, •••,») ^o, 

and thus 


1 — ^(l, ••*,«.) 0. 

Proof: The condition is necessary by Theorem 6. It is sufficient by Lemma 
1, 2 and an obvious formula expressing pi(«i, • • • , a r ) in terms of the p{, t ... r< ]’s. 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items . 


A NOTE ON SHEPPARD’S CORRECTIONS 


By Cecil C. Craig 


University of Michigan 

As far as the author is aware, H. C. Carver was the first to point out that 
while the formulae ordinarily given for Sheppard’s corrections for central mo¬ 
ments are valid for moments computed about the population mean, there are 
still systematic errors present when they are applied to central moments calcu¬ 
lated from any particular grouped frequency distribution [1]. This is due, of 
course to the fact that the mean of a grouped frequency distribution is in general 
different from that of the distribution before grouping. For a fixed class interval 
k, Sheppard’s corrections give the average value of a moment about a fixed 
point of a given order for all the groupings of this class width possible and will 
fail to do so if the moment in question is calculated for each position of the class 
limits about a point which varies as the class limits shift. Thus Carver [1] 
pointed that the commonly used formula (for a continuous variate), 


( 1 ) 


M2 = *2 


k 2 

12 ’ 


should, if V 2 is calculated about the mean of the grouped distribution as it is in 
practice, be replaced by 

f9 s k 2 * 

( 2 ) Ma = ~ J2 + *m 

in which <t 2 m is the variance of the means of grouped distributions over all posi¬ 
tions of the class limits with the fixed class width k. 

Recently J. A. Pierce [2] gave a method for deriving the required formulae of 
the type of (2) and gave actual formulae for both moments and seminvariants 
through the sixth order. It is the purpose of this note to point out that the use 
of moment generating functions provides a more elegant and concise way of 
arriving at formulae equivalent to Pierce’s though in a somewhat different form. 
This method can be immediately extended to distributions of two or more 
variates. 

In a previous paper [3] on Sheppard’s corrections for a discrete variate, the 
author made use of the following argument: It is assumed that for a fixed class 
width k, any point in the scale on which the variate x is plotted is as likely to be 
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chosen as a class limit as any other; choosing a system of class limits for grouping 
the data is then equivalent to placing at random on the z-axis a scale with 
division points at intervals of k. Once the system of class limits is chosen any 
value of x before grouping bears to the class mark, z*, of the class in which it 
falls the relation, 


(3) 


Xi ~ x + «, 


in which x and « are independent variates. The frequency law governing z, is, 
of course, that of the population from which it is drawn while e is distributed 

( k k\ 

— -, -) for a continuous variate 
and ——— k , —^ - kj if m consecutive values of a discrete variate are 


grouped in each class interval. In either case 


( 4 ) M Si W - M X (»)M'W 

in which M X( (&) is the moment generating function of the variate z», etc. The 
expansion of both sides of (4) in powers of & gives the relations between the 
average values of moments of the grouped distribution over all positions of the 
scale and the moments of the ungrouped distribution from which Sheppard's 
corrections are obtained by solving for the moments of the ungrouped distribu¬ 
tion. The relations are valid for any fixed point about which the moments are 
computed; if this fixed point be taken as the mean of the ungrouped distribution 
the ordinary Sheppard's corrections for central moments result. 

But it is quite easy to modify (4) to give the necessary relations in case the 
moments of each grouped distribution are computed about the mean of that 
distribution. We have only to write 

( 5 ) Xi *= Xi — £ + £ 

in which 2 is the mean of the grouped distribution for which z* is one of the class 
marks. Then 

M Xi (d) = tf, w) |u_* 

( 6 ) 

If we write, 

in which is the product seminvariant of order rs of moments about the means 
of the grouped distributions and of such means, the expansion of the logarithm 
of the second member of (6) gives 

(7) 1 + (Xio + Xoi)t> + (X» + 2Xn + Xoa) ~ + (X» + Xoi) (,) + • • • f 
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in which 

(Xlfl + Xoi) <r> « Xrt + -(-••• + (0 Xr_*,* + • • • + 3w • 

The expression of the logarithm of the right member is*: 


( 8 ) 


Xi* + X* 


1* 

21 



+ 


+ Z (“1)' +1 


BJe u 
2 s 


\ W (2®)! ’ 


for a discrete variate (the result for a continuous variable is obtained merely by 
lettingm —► oo) in which X r is the rth seminvariant of the ungrouped distribution 
and B, is the «th Bernoulli number. 

We may without loss of generality take the origin for x at the mean of the 
ungrouped distribution so that Xi = 0. Further it is easy to see that 


Consider 


% lr = o, r = 0, 1, 2, 3, • • • . 
E[(x t - x)X r ] = p lr 


For a fixed 2, i.e., for a given grouping, this becomes 

£E{Xi - £) = 0 




Then since Pi r is the average of this over all groupings with a given class interval, 
Pit — 0, and from the expression for Xi, in terms of the moments P„ it is obvious 
that also X lr = 0. 


Then we must also have Xoi — 0 as is otherwise obvious and (7) can be rewritten 


(9) 


1 + (X»o + Xm) 7 j- + (X»o + 3X»i + X«) gj + • • * • 


Now from (8) and (9) by equating coefficients of like powers of t?, we get the 
set of formulae: 

Xi — 0 

(10) X* = Xao + 3 Xji + Xo. 

X4 = X40 + 4 X«i + 6X22 + X04 + ^1 — j— 


These formulae, however, do not give the sought Sheppard's corrections for 
seminvariants calculated from grouped distributions of a discrete variate. See 
below. 

Referring to formula (10), p. 58 of the author's paper cited [3 ], it is easily seen 
by comparison that the required moment formulae are obtained from the general 
formula 

(11) M» = 
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in which <*, is given by formula (9) of this former paper. For n » 1, 2, 3, 4 
we write down immediately 




M a 

(12) w 


AH 


0 (Pjfl = Pol = 0) 

+ 3f»i + Pot 
t>40 + 4p»l + OPfc + 


- (' - ») <>» + '«> \ + (‘ - ^.)( 7 - 5 ) So - 


In these formulae, Pa is, of course, the average value of rth central moments 
about the means of grouped distributions. From the definition P rt (a ^ 0) is the 
average value of the product of the rth central moment of a grouped distribution 
by the sth power of the mean of the same grouped distribution. Also, it must 
be noted that in the formulae (10) the X r ,’s there are to be calculated by the 
usual formulae from the moments, Pi ,-, and are not themselves the average values 
of like seminvariants calculated from the separate grouped distributions. Thus 
though the formulae (12) give the sought Sheppard’s corrections for moments, 
the formulae (10) do not do the like for seminvariants in general. However, 
since in each grouped distribution, 


X* = vt 


and 


Xa = vt 

we have, taking the expectation or average value over the grouped distributions, 


E(\t) — E(vt) — P*o = X*o 

and 

E(\z) — E(vi) = Pjo = X*o, 

and the first two formulae of (10) do give the Sheppard’s corrections for Xj and 
X* calculated from grouped distributions of a discrete variate. 

But the case for X« is different. In each grouped distribution, 

X« = vt — 3*4 , 

and if we define U by 


E(\r) = Ir, 
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we have 

U ■ P 40 — 32?(?t) 

«« p40 — 3(p|0 + l*:r t ) *« X 40 — 3*fc: r| , 

if i*j r| is the variance of y*in the grouped distributions. 

In a s im i l a r way one can obtain such formulae for seminvariants as may be 
required. Through the sixth, the formulae for the Sheppard’s corrections for 
the seminvariants calculated from a grouped distribution of a discrete variate are: 

X* *= U + 3X«1 + X 03 


(13) x *~ li + + 4 * 81 + + ^°4 + (i - ^i) 

X| = h + 10vii:, 2 + 5X41 + IOX 32 + IOX 23 + Xo* 

X« — U + 15 Vll:r it 9 A + Wvr.*% 30^8: kj — 90^2>, y» 

+ 6X m + 15X4, + 20X» + 15Xm + X„ - (l - i ) 

\ to*/ 252 

In these formulae, vu-.,,.,, is the ij'th central product moment of v, and v, in the 
grouped distributions. 

To illustrate these formulae numerically and to facilitate comparison with 
Pierce’s results, we will use the example he chose. His ungrouped distribution 
was: 


V 

f 

V 

f 

V 

f 

1 

2 

4 

30 

7 

1 

2 

8 

5 

4 

8 

1 

3 

10 

6 

3 

9 

1 


From this the following three grouped distributions with k <= 3 can be formed: 
(l) (2) (3) 


class 

f 

class 

f 

class 

f 

1 - 3 




-1 i-i] 

2 

4 - 

37 



2 - 

48 

7 - 

3 

6- 


5- 

8 

10-12 

0 

9-11 


8-10 

2 
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With origin at v = 4, we have the following table of moment characteristics 
of these four distributions: 


Distribution 


Pt m Xj 

pi m Xg 

r* 

X* 

to: - - (-s) 

0) 

9 

98)0 


238849317 

50388966 

19 

60 


60* 

60* 


60 

(2) 

9 

10179 




1 

60 

60* 

60* 

! 60* 

i 

60* 

60 

(3) 




528282000 

294904800 

20 




60* 

60* 

60 

Average 

10 

9606 

622440 

441657198 

163839996 


60 

60* 

60* 

60* 

60* 



t 

Mi 

M2 • Xs 

Mi “ Xi 

M* 

X4 


Original 

Distribution 

SIS 

i 

7460 

60* 

642400 

60* 

305034000 

60* 

138079200 

60* 



From the table, 


ho ** Xao 

9606 

60® 

P*o = Xao 

622440 

60* 

i>40 = X 40 + 3Xso = 


441657198 

60* 


We further compute: 


r w 


Pw 

Pw 


2(5«0 2 _ 254 _ ? 
3 60 2 K 


-380 , 

- 60 T 

96774 

60 4 


PB — PjoPlS 


-72978 

60* 


Pu 


Pm 

Pa 


2(*rf*) _ 6780 _ * 

3 60* “ 


8705412 

60* 

2360946 

60* 


Xjj 


X<>4 *= Po« — 3 p|* 


96774 

6 r * 
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»fcr, * 


330948 
3 w ^O* - 


i| *= X40 — 3 vi:p s = P40 — 3 Pfo — 3 vii, t 


(‘ 

(■ 


1 _\£ _ 2 
n*/12 3 

j )( 7 - 1 ) 


240 


2 . 


163839996 

60* 


With these values one may check the formulae (12) and (13) as far as weight 
four. For example: 

_ 9606 254 _ 2 = 7460 

Ms 60* + 60* 3 60* 

A< = -J- (163839996 + 991494 - 34821648 - 437868 - 96774 + 8640000) 


138079200 

60* 


It may appear at first glance that since 

Pr, = E[Pr(6PlY] 

and could be expressed by means of the notation, Pu : , r ,p/ f the notation in (12) 
and (13) could be made more uniform. It could be but at the expense of greater 
complexity in these two sets of results. Moreover, it is convenient that X, f 
is expressible in terms of v k i s in precisely the same way that product semin- 
variants are ordinarily expressible in terms of product moments. 

Pierce’s results differ from the above not only in their mode of derivation but 
also in the fact that they express Pro’s and Z/s in terms of the characteristics of 
the ungrouped distribution and moments and seminvariants of moments in the 
grouped distributions. Thus as they stand they are not formulae for Sheppard’s 
corrections. 

Finally it must be remarked that in comparison with the usual formulae for 
Sheppard’s corrections, the formulae (10) and (13) introduce quantities the 
magnitudes of which are not known in general except that ordinarily they are 
quite small. It is hoped that results on this point will be forthcoming soon. 
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ON THE ANALYSIS OF VARIANCE IN CASE OF MULTIPLE 
CLASSIFICATIONS WITH UNEQUAL CLASS FREQUENCIES 

By Abraham Wald 1 
Columbia University 

In a previous paper 3 the author considered the case of a single criterion of 
classification with unequal class frequencies and derived confidence limits for 
<r' 3 /<r 3 where <r' 3 denotes the variance associated with the classification, and <r* 
denotes the residual variance. The scope of the present paper is to extend 
those results to the case of multiple classifications with unequal class frequencies. 

For the sake of simplicity of notations we will derive the required confidence 
limits in the case of a two-way classification, the extension to multiple classifica¬ 
tions being obvious. 

Consider a two-way classification with p rows and q columns. Let y be the 
observed variable, and let n<, be the number of observations in the tth row and 
jth column. Denote by ylf the Mh observation on y in the ith row and jth 
column ( k = 1, • • • , n<,). Let the total number of observations be N. We 
order the N observations and let y a be the ath observation on y in that order. 
Consider the variables: 

t) t\ , • • • ) tp y V 1 , • • • , Vq , 

and denote by t a the ath observation on t, by U a the ath observation on U and 
by Vja the ath observation on Vj . The values of t tt , Ua and v jn are defined as 
follows: 

t a = 1 (a = 1, • • • , N), 

Ua — 1 if y a lies in the ith row, 

Ua = 0 if y a does not lie in the ith row, 

Vja = 1 if y a lies in the jth column, 

v^ = 0 if 2/ 0 does not lie in the jth column. 

We make the assumptions 

Vi? — Xjj + u + m , 

where the variates xlf, a , >»,- (i = 1, • • • , p; j = l, • • • , q‘, k = 1, • • • , n</) 
are independently and normally distributed, the variance of a:,-*' is <r 3 , the vari¬ 
ance of « is o-' 2 , the variance of ij, is <r" 2 , and the mean values of a and y, are 
zero. 


1 Research under a grant-in-aid from the Carnegie Corporation of New York. 

* “A note on the analysis of variance with unequal class frequencies,” Annals of Math. 
Stat., Vol. 11 (1940). 
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Let the sample regression of y on t, U, ..., ^, Vi, ..., t>*_i be 
Y ™ at + ■+•••••+• iij>_i(p_i + djt>i + •• • -f* d*_iv«_i • 

We want to derive confidence limits for 

Let us introduce the notations: 


Ew. 

a 

** Ooi 


(»' “ 

1 , 


1), 

E 

= Oop-i+y 


O' - 

1 , 

... ,q - 

-1), 

E Uatia 

= CUj 


(ft j = 

1 , 

... t p- 

1), 

E ■/ titflia 

= Oip-i+j 

(t = 1, • • • , v 

It 

•<-> 

1 

1 , 

... ,q- 

- 1), 

E v <» v i« 

dp— l+» p—1+; 


(ft i = 

L 

... ,q - 

- 1), 

II 

«»/11 - ii«*/ir 

(i, j 

= 0, 1 , •• 

’• 1 

p + q- 

- 2). 


Let the regression of xjf on t, t x , • • • , <„_i, Vi, • • • , be 

X = a*t + b*ti -j- • • • + b*-itp-i + d*v i + • • • + dg-iVq-i. 

The regression of « + 17 , on the same independent variables is evidently equal to 

« 1<1 + • • • + fptp + V\V\ + • • • + VqV* 

= (Vt + *p)t + («i - t p )h + • • • + (*p-i — *p)tp-i 

+ (vi — Vi)vi + * * • + 1 

since t p — t — <1 — • • • — tp~i and v Q = t — v^ — v^-i. Hence 

(1) b { = 6? + (a — t p ), (i = 1, • •. , p — 1), 


and therefore 


Gbjb, ~~ &b*b* CijO "H "f" ^Iplj 


= faj + (1 + &/)X 2 ]<r 2 , (*» j = 1> • • • j V “ 1)» 


where in is the Kronecker delta, i.e. = 0 for i j and 8« — 1 . Denote 
d,- + (1 + 5 </)X* by c'a . Since the expected value of b* is equal to aero, on 
account of ( 1 ) also the expected value of b< is equal to zero. Let 

II 9u II - II ft'/ If 1 . (t,j * 1, ,p - 1). 

Then 


±g 

<r i -1 


( 3 ) 
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has the x*-distribution with p — 1 degrees of freedom. The expression 

(4) -3 £ (y. - KJV 

0 ^ a*l 

has the x s -distribution N — p — q + 1 degrees of freedom. The expressions 
(3) and (4) are independently distributed. Hence 

/.\ N — p — q + 1 'EZgabibj 

K > P -1 s(!/. - Y.r 

has the F-distribution (analysis of variance distribution). We will now show 
that (5) is a monotonic function of X 2 . It is known that 220,/M*/ is invariant 
under linear transformations, i.e. 

2 ZgiMi = 22 f g’ijb'tb'i , 

where is an arbitrary linear function, say m\bi + • • • + mp-ibp-i of 6 i, • • • , 
V-x (i « 1 , • • • , p - 1 ) and 

Ill'll = iiatwir. 

We can choose the matrix || || such that 

«< = Wi(«i — «*) + ••• + wh(vi ~ «j>)> (* = 1, • • • , p — 1), 

are independently distributed and vj; = o-' 2 . The coefficients of course do 
not depend on o'. We have 

Ohlbj = fffci'i}' + i.y<r' 2 , (5<, = Kronecker delta). 


Now let 


= ViifeJ + • • • + Vip-ibp..!, (v = 1, • •. , p — 1), 

where || va\\ is an orthogonal matrix and is chosen such that b*", • •• , b*'h 
are independently distributed. On account of the orthogonality of || v,-,-1| we 
obviously have 

<r*i' =* o\\" 4- <r' 2 ; Ob jv/ = 0 for i ^ j. 


Hence 


( 6 ) 


£ £ QiMi - 


T—^L— 

h o\r + xV 


The right hand side of ( 6 ) is evidently a monotonic function of X 2 which proves 
our statement. The endpoints of the confidence interval for X 2 are the roots 
in X 2 of the equations 

(7) N -p - q + 1 IZgabibj _ „ . N - p - q + 1 22flty&i6,- _ v 
K i ~ P-1 2 (y.-Y a y 2 ’ p-1 2(j/ a -y .) 2 

where F 2 denotes the upper, and Fi the lower critical value of F. 
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Hie derivation of the required confidence limits in case of classifications in 
more than two ways can be carried out in the same way and I »ha.H merely state 
here the results. 

Consider r criterions of classifications and denote by p« the number of classes 
in the uth classification (u = 1, ••• ,r). Denote by n* r ..< r the number of 
observations which belong to the t'lth class of the first classification, ijth class 
of the second classification, • • • , and to the t r th class of the rth classification. 
Let yff . be the kth observation on y in the set of observations belonging to 
the classes mentioned above (k — 1, ... ,We make the assumption 


y <*> 

Vu- 


*fr>. 


■ 4- 

•l r 


+ * * * + 


where the variates 


f * * * ? ti'r (tti * 1 j M * ■ 1, • • • , f j k 1, • • • , fii x •••4)1 

are independently and normally distributed, the variance of * r is or 2 , the 
variance of is <rt and the mean value of c*** is zero (i u = 1, •. • , p u ; 
u = 1, • • • , r). 

Let iV be the total number of observations. We order the observations in a 
certain order and denote by y a the ath observation in that order (a* 1, ... , N). 
Consider the variables: 


t f ti u f (U = 1, • • • , T) iu * 1, • • • | p*) f 

and denote by t a the ath observation on t and by tl^a the ath observation on 
ti*\ The values of t a and are given as follows: 

ta = 1 (a = 1 , • . • , N ), 

ti*l = 1 if y a lies in the i«th class of the uth classification, 

tfya *= 0 if y a does not lie in the i *th class of the uth classification. 

Let the sample regression of y on t, be given by 

Y^at + t, Jibuti?' 

u -1 i tt «l 

Let the covariance of 6,-" ) and 6)“' be given by C<“/„ a under the assumption 
that <ri = at = ••• = ffr = 0. The matrix || C«”/ B || (4 , j'« = 1, • • • , p« — 1) 
can be calculated by known methods of the theory of least squares. Let 

\\gii ii = ii ci:> u + (i +km ir (iuju -1,..., P u - 1), 

where 5i u is the Kronecker delta and xl = Then the lower and upper 

confidence limits for x£ are given by the roots in Xi of the equations 

N - Z P« + f “ 1 £ £ Oiuiubi? bj? 

U-l _ )„-l < u -l _ 

* - 1 £ <». - Y.f 

a-1 


( 8 ) 


- ft « - 1, 2), 
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where Ft is the upper and Fi the lower critical value of the analysis of variance 

r 

distribution with p H — 1 and iV — 2 + r — 1 degrees of freedom. In 

case of a single criterion of classification the confidence limits (8) are identical 
with those given in my previous paper. 


THE FREQUENCY DISTRIBUTION OF A GENERAL MATCHING 

PROBLEM 

By T. N. E. Greville 
Bureau of the Census 

1. Introduction. This paper considers the matching of two decks of cards of 
arbitrary composition, and the complete frequency distribution of correct 
matchings is obtained, thus solving a problem proposed by Stevens. 1 It is also 
shown that the results can be interpreted in terms of a contingency table. 

Generalizing a problem considered by Greenwood, 2 let us consider the matching 
of two decks of cards consisting of t distinct kinds, all the cards of each kind being 
identical. The first or “call” deck will be composed of i\ cards of the first kind, 
it of the second, etc., such that 

i\ + it + U + • • • + it = n; 

and the second or “target” deck will contain ji cards of the first kind, j% of the 
second, etc., such that 

ji + ji + • • • + jt = n. 

Any of the i s or fa may be zero. It is desired to calculate, for a given arrange¬ 
ment of the “call” deck, the number of possible arrangements of the “target” 
deck which will produce exactly r matchings between them (r = 0, 1, 2, • • • , n). 
It is clear that these frequencies are independent of the arrangement of the call 
deck. For convenience the call deck may be thought of as arranged so that all 
the cards of the first kind come first, followed by all those of the second kind, 
and so on. 


2. Formulae for the frequencies. Let us consider the number of arrange¬ 
ments of the target deck which will match the cards in the foth, fath, • • •*, fc.th 
positions in the call deck, regardless of whether or not matchipgs occur elsewhere. 
Let the cards in these s positions in the call deck consist of c x of the first kind, 
c* of the second, etc. Then: 

Cl + Ct + • • • + Ct = 8. 

The number of such arrangements of the target deck is 


( 1 ) 


^ (ft - s)l 

n Uk — Ch) i 

A-l 


1 W. L. Stevens, AnnaU of Eugenics , Vol. 8 (1937), pp. 238-244. 

* J. A. Greenwood, Annals of Math . Stat., Vol. 9 (1938), pp. 58-59. 
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For fixed values of the c f s, the s specified positions may be selected in 

(O) TT %k\ 

h-i c*!(t* — ci)! 

ways. 

Consider now the expression 


(3) 


V. 


(n - «)! II »*1 

= r- 


IX c*!(i* — Ck) \(jk — c*)l 

A-1 


obtained by summing the product of (1) and (2) over all sets of values of the 
numbers c x , c% • • • , c ( satisfying the conditions: 

0 ^ Ck&ik, Ck & jh, and c* = «. 


Let W § denote the number of arrangements of the target deck which result in 
exactly s matchings. Then it is evident that V, exceeds W ,, since the former 
includes those arrangements which give more than 8 matchings, and these, 
moreover, are counted more than once. Consider an arrangement which 
produces u matchings, where u > s. Such an arrangement will be counted 
once in V , for every set of 8 matchings which can be selected from the total of 
u —that is U C 9 times. In other words, 


V r - W r + r+l C r Wr +, + +"• + ”C r W n . 


It has been shown 8 that the solution of these equations is 
(4) Wr = Vr — r+1 C,Vr+l + T *C r Vr* -+ (-1)"^ *CrV n . 


3. Computation of the frequencies. Equations (3) and (4) apparently give 
the solution of the problem, but in practice the labor of carrying out the sum¬ 
mation indicated in (3) would often be very great. However, (3) may be re¬ 
written in the form 


(5) 


where 


Iljhl 

h—» 1 




H. Qxibinobr, Annals of Math . Stat ., Vol. 9 (1938), p. 282. 
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It will be seen that H. is the coefficient of x' in the product 


( 6 ) 



k\ (u 


iklihlx" \ 
i* - fc) i; 


f 


where il denotes the smaller of t* and j*. The factor JI was included in 

hml 

H, in order to make the coefficients in the polynomials of (6) always integers. 
Equation (4) may now be written in the form 


= E (-1 ) ,_r ‘C T ( -V-^ 

idj 


or 

(7) 


r!£Z(a —r)! 


n 


a form which lends itself to actual computation. 


4. Factorial moments. The factorial moments of the frequency distribution 
of the number of matchings are easy to compute. Let m a denote the 5th factorial 
moment, so that 


E r U) Wr 

(8) m. - ^-. 

E^r 

r—0 

Substituting from (4) 

E r U) W T - E (r (,) E u C r V u }. 


Reversing the order of summation and simplifying, 

E r M W r = E (« w V. E (-1)“^ - s 1 V .. 

r—» l, r-i 

Hence, 


n 


v.-Ef, 

r—0 


n! 


« * 



(9). 



and from (5) and (8), 

( 10 ) 
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8. Mean and variance. From (6) 

(11) ffi = 2 

and 

(12) Ht = i £ ih(ih — 1 )i*(i* — 1) + 23 ihikjhjh- 

^*"1 

Kftk 

Hence the mean number of matchings is 

2Z nib 

(13) -. 

n 

The variance /xj is 

1 r 1 1 

m* + mi — m\ = - 77 --rr n 2 4(4 — 1)4(4 — 1 ) + 2 n X 4444 

n%n — 1; L *-1 *.*-1 

*<* 

+ n(n ~1)E 44 - (» - D^L 44^ J, 

or 

(14) M 2 = — T) **4^ ~ n ]C **4(4 4) + n 2 44^• 

In the special case 4 = 4 = • • • = 4 — 4 these formulae become 

" ,=i ' w = ^stij( n ’-S ii )- 

These formulae have previously been given by Stevens, 4 and those for the 
special case also by Greenwood. The maximal conditions for the variance, 
given by Greenwood for this particular case, apparently can not be put in a simple 
form for the general case. 


6. Unequal decks. Suppose the call deck contains m cards, m < n, and is to 
be matched with m cards selected from the target deck. It can be assumed 
without loss of generality that the first m cards in any arrangement of the target 
deck are the ones to be used. The formulae of this paper can be applied to this 


W. L. Stevens, Annals of Eugenics, loc. cit., Psychol. Review, Vol. 46 (1930), pp. 142-160. 
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more general problem by the expedient of imagining n — m blank cards to be 
added at the end of the call deck and regarding these as an additional kind. 
It is thus apparent that formulae (13) and (14) apply without modification to 
this altered situation. 


7. Application to contingency table, Stevens 6 has considered the distribution 
of entries in a contingency table with fixed marginal totals, and has pointed out 
that the problem of matching two decks of cards may be dealt with from that 
standpoint. A contingency table classifies data into n columns and m rows, 
and we may consider the row as indicating the kind of card which occupies 
a given position in the call deck, the columns having the same function with 
respect to the target deck. Stevens defines a quantity c as the sum of entries 
in a prescribed set of cells, subject to the condition that no two cells of the set 
are in the same row or column, and mentions as unsolved the problem of the 
exact sampling distribution of c. 

We now have at our disposal the machinery for solving this problem. Fol¬ 
lowing Stevens's notation, let ai, 02 , • • • , a m denote the fixed row totals and 

b \, 62 , • • • , b n the fixed column totals, while x r8 denotes the frequency of the 

1 

cell in the rth row and the sth column. Then, let c = X! %r h * h , where l does 


not exceed either m or n. 


Imagine two decks of N cards 




the first containing a x cards of one kind, 02 of another, etc., and the second 
containing b x cards of one kind, b 2 of another, etc. Moreover, let the r A th kind 
in the first deck and the s*th kind in the second deck be the same kind (h = 
J, 2, • • • , l) } the other kinds being all different. Evidently c is the number of 
matchings between the two decks. Hence, the methods of this paper can be 
used to obtain the distribution of c. The formulae we have obtained agree with 
those for the expected value and variance of c given by Stevens. 


ON METHODS OF SOLVING NORMAL EQUATIONS 

Bt Paul G. Hoel 
University of California , Los Angeles 

There seems to be considerable disagreement concerning what is the most 
satisfactory method of solving a set of normal equations. Since such informa¬ 
tion as errors of estimate and significance of results is usually desired in addition 
to the solution, in its broader aspects the problem is one of deciding what is the 
most satisfactory method of calculating the inverse of a symmetric matrix. 

For equations with several unknowns some compact systematic method of 


1 W. L. Stbvsns, Annals of Eugenics f loo. oit. 
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calculation is necessary to eliminate much of the labor involved in the ordinary 
method of calculating the inverse from its definition. Among the more common 
of such systematic methods are those associated with the names of Ohio, 1 Gauss, 1 
Doolittle, 9 and Aitken.* In addition, A. A. Albert 4 recently called attention 
to a method implicit in elementary matrix theory. There are also various 
iterative schemes, and schemes which are but slight variations of the above 
methods. In this note only the methods associated with the above names will 
be considered, and for convenience they will be labeled with those names, regard* 
less of who should be given credit for them. 

The purpose of this note is to show that when the calculation of the inverse is 
systematized, all of the above methods are fundamentally equivalent and merely 
involve a different arrangement of work. Consequently, any advantage in calcu¬ 
lating time for any particular method will arise through such features as a 
simpler technique or less copying, rather than through fewer multiplications and 
divisions. 

By the method of Ohio is meant the evaluation of determinants by the pivotal 
method of reduction. Since all of the methods mentioned above use pivotal 
reduction, the method of Ohio will not be treated as a distinct method. Fur¬ 
thermore, since Gauss’ method is incorporated in that of Aitken, it will be neces¬ 
sary to consider only the methods of Aitken, Doolittle, and Albert as distinct. 

First consider the method of Albert, which is based on the following matrix 
properties. Let the matrix A be subjected to a sequence of row transformations 
leading to the matrix A'. Then, writing A = IA, it follows from a theorem in 
matrix theory that A' = I'A, and consequently that A'A -1 = I'. If row trans¬ 
formations are chosen which make A’ = I, then A -1 = I'. This states 
that if the same row transformations are applied to the identity matrix as were 
used to reduce A to the identity matrix, then the resulting matrix will be the 
desired inverse. The customary manner of reducing A to I is to work for zeros 
in columns as follows: 




flu J 

01* 

0i» 

011 

flu • • • &1\ 

011 

0n 

0*1 

• 

• 

• 

0*2 • • • 0*i 

0 

(n n 

1 a** — ai* — I • • • 

V 0ii/ 

• 

• 

( 0*1^ 

( 0** — 0m — 

\ 0iiV 

• 

• 

0»1 

0n* • * • 0» 

0 

• 

( n n 0nl\ 

1 CLni — fli* - 1 • • • 

\ 011/ 

• 

(n n Cnl 

l 0»« 01» — 
\ Oil, 


1 See, for example, Whittaker and Robinson, The Calculus of Observations, p. 71 and p. 
234. 

* See, for example, Croxton and Cowden, Applied General Statistics, 1030, p. 716. 

* Roy. Soc. Edin. Proc., Vol. 67 (1936-37), p. 172. 

4 Am. Math. Monthly, Vol. 48, No. 3 (1041), p. 198. 
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where new letters are introduced for new elements after each reduction. After 
zeros are obtained below the main diagonal, zeros are obtained above the 
diagonal by starting with the last column. If now these operations are per¬ 
formed in the same order on I, the result will be A” 1 . 

Next consider the method of Aitken, which is based on the evaluation of a 
bordered determinant, namely, 

Uu • • • fliy «• • Oi» 0 

0,1 • • • Oij • • • din 1 ) 

cofactor of a #. 

I 

O n i * . • On, 4 • • Onn 0 

0 ... -1 ... 0 0 

To obtain A -1 it is merely necessary to evaluate determinants of this type and 
divide them by | A |. Aitken’s method evaluates all such determinants simulta¬ 
neously, using Ohio's reduction technique in much the same manner as illustrated 
above with Albert’s method. Thus, 


An Atf • • • Ain 

1 0 ... 0 

A21 A 22 • • • Asn 

0 1 ... 0 

• • • 

A n l &n2 • • • Ann 

0 0 ... 1 

-1 0 ... 0 

0 0 ... 0 

0 -1 ... 0 

• • • 

0 0 ... 0 

• « » 

• • • 

0 0 ... -1 

• • • 

0 0 ... 0 
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0 -1 ... 0 0 0 ... 0 


-1 0 0 ... 0 


<*u&a j 

(Zu 


a i8 


ain 

1 

0 . 

• •0 


an 


an 


an 

an 


0 

1 


bu 

6 • • 

b*n 

an 

1 

■ 0 





bn 

an bn 

bn 

0 

0 

('bn 

-•■g) 

• • • 

.i 

• • 

1 

i\ _ a«i\ 

/ \an bn an / 

_bn 

bn 

• 0 

• 

0 

0 

(bn* 

-*■£) 

• • • ^bnn 

, b n a 

0t»r— 

Ow 

i\ /Oil 6nl _ Onl\ 

/ \an bn an / 

b*n 
bn ' 

• 

.. 1 

0 

0 

( Ol8 

_ an 

/Oln 

Ji 

11 

1 

(4-+ 1 ) 

flii 

•. 0 

\au 

an 622/ 

\an 

fin O22 

\an Oss an/ 

aubn 

V 

0 

0 


&23 

1 • • 

bin 

an 

1 

• •0 


632 


ba 

anbn 

bn 

0 

0 


0 

) • 9 

-1 

0 

0 • 

• •0 


When zeros are obtained below the main diagonal to the left of the vertical 
dividing line, the matrix in the lower right section will be A -1 . This follows from 
the fact that the elements of this matrix will be the evaluations of bordered 
determinants, like those of the previous paragraph, divided by aui>» • • • = | A |. 

It will be observed that the operations on A in Albert’s method which produce 
zeros below the main diagonal are the same as those which, occur above the hori¬ 
zontal dividing line in Aitken’s method. This set of operations is performed 
simultaneously on I, since the upper right section of Aitken’s scheme is I. Fur¬ 
thermore, obtaining a zero for an element below the horizontal line and to the 
left of the vertical line, is equivalent to obtaining a zero for the element cone- 
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sponding to the same row and column in the section above the horizontal! pro* 
vided the preceding columns contain zeros above the diagonal. But obtaining 
zeros above the main diagonal of A constitutes the second set of operations in 
Alberti method to obtain A' = I. Thus, the operations in Aitken’s method 
which produce zeros in a given column for elements above the horizontal line 
are merely the first set of operations in Albert’s method, while those which 
produce zeros below the horizontal line are the second set of operations in reverse 
order. Since, in Aitken’s scheme, the first set of operations is performed on I 
in the upper right section and the results are transferred a row at a time to the 
lower right section, where they are in turn operated upon by the second set of 
operations, this lower right section is merely I operated upon by the entire set 
of operations of Albert’s method. Consequently, Aitken’s and Albert’s methods 
are the same except for the order in which operations are performed and differ* 
ences arising therefrom. Since Aitken’s method performs these operations more 
compactly, it is to be preferred to that of Albert. 

Next consider the method of Doolittle, which is described by following 
the instructions given in the first column in the table shown on page 34S. 
The forward solution is completed after n such sectional operations. For a 
given k column, the backward solution is obtained as usual by substitution in 
the last row of each section taken in reverse order. 

If all summations in each section are performed in pairs and the sums recorded 
each time, rather than being performed in one operation, the forward solution 
of the Doolittle method will be found to be a rearrangement of the work occurring 
above the horizontal line in Aitken’s method. Thus the first lines of each 
section^ give the matrix above the horizontal line in Aitken’s scheme. Then, 
except for signs, I f and the sums of the first two lines of the remaining sections 
give the result of Aitken’s first sequence of operations above the horizontal. 
Then, except for signs, IV and the sums of the first three lines of the remaining 
sections give the result of Aitken’s second sequence of operations above the 
horizontal, etc. 

The back solution involves precisely the same operations as those making up 
the second set of Albert’s sequence of operations to obtain zeros above the main 
diagonal. Since these were shown to be a rearrangement of operations in 
Aitken’s method, it follows that the methods of Aitken and Doolittle are the 
same except for the order of operations and differences arising therefrom. Hence 
all three methods are basically the same when systematized for a calculating 
machine. 

Because of this equivalence, the number of necessary multiplications and 
divisions will be the same for all three methods, and will be found to be 
Jn*(n + 1). Since Aitken’s method is to be preferred to that of Albert, it will 
suffice to compare the methods of Aitken and Doolittle for calculating con¬ 
venience. 

The Doolittle method possesses several distinct advantages. First, its multi¬ 
plications occur a row at a time with one of the factors constant for that row; 
consequently the keyboard remains unchanged for a given row of operations. 
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Aitken’s method, however, consists of calculating successive cross products' 
which requires clearing of the keyboard after each such operation Secondly' 
there are fewer additions in the Doolittle method. It sums ^quantities at a 
time in section i, while Aitken’s cross products always involve the sum of two 
quantities. Because of the necessity of calculating the complements of negative 
Hum b, this difference becomes important when the number of variables is large. 
A third feature in favor of the Doolittle method is the ease of performing the 
calculations without previous experience. It may be easier to understand how 
to calculate cross products, but actually the calculations of the Doolittle method 
are easier to perform. Aitken’s method requires some experience with it, if one 
is to avoid repeating certain calculations which would result from calculating all 
cross products mechanically . The comparative amount of copying in the two 
methods depends upon the number of variables involved. 

From the above considerations, it may be concluded that the Doolittle method 
is to be preferred among those considered in this paper for solving a set of normal 
equations or calculating the inverse of a symmetric matrix. However, if a 
sin gle calculating technique is desired which can be used for nonsymmetrical 
equa tions as well, then the method of Aitken is to be preferred. 
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CONDITIONS THAT THE ROOTS OF A POLYNOMIAL BE LESS THAN 
UNITY IN ABSOLUTE VALUE 

By Paul A. Samuxlson 

Massachusetts Institute of Technology 

: 1. Introduction. In econometric business cycle analysis, probability the¬ 
ory, and numerical mathematical computation the problem of convergence of 
repeated iterations arises. The solution of the difference equations defining 
such a process can in a wide variety of cases be shown to be stable in the sense of 
converging to a limit if a certain associated polynomial 

(1) fix) “ Po* B + Pix n ~ l + • • • + p% * 0, 

has roots whose moduli are all less than unity. 

Thus, for “timeless” linear difference equation systems of the moBt general 
type, convertible into normal form, 

(2) Qt{t + 1) — X) OijQj(t), (t * 1, • • • , n), 

i-i 

the polynomial is the characteristic or determinantal equation, 

(3) f(x) = | an - xSn | = 0, 

which when expanded out is of the form (1). The roots of this equation, when 
multiplied by suitable polynomials in t, give the exact solution of the problem 
in the form 

(4) Q(t) = £ Qi{t)x \, 

*-l 

where m is the number of distinct roots, and the fir's are polynomials of degree 
one less than the multiplicity of the respective root. If complex roots occur, 
they do so in conjugate pairs and can be combined to form damped, undamped, 
or anti-damped harmonic terms. All terms go to zero as t approaches infinity 
if, and only if, the absolute value of each x is less than unity. 

For non-linear systems the exact solution does not take this form, but in the 
neighborhood of an equilibrium point the roots of an associated polynomial, 
except in singular cases, do determine the stability of the system. 

As far as the writer is aware, there does not appear in the literature an account 
of necessary and sufficient conditions for the roots of a polynomial to be less than 
unity in absolute value. This is in contrast to a related problem which arises 
in connection with the investigation of stability of dynamical systems defined by 
differential equations. These have associated with them a polynomial whose 
roots provide solutions in the form 

( 5 ) 



ROOTS OF 4 POLYNOMIAL 


or for non-linear systems infinite power series in such terms, ft is required, 
therefore, to determine complete conditions under which the real parts of all 
roots must be negative. 

This problem has been solved by Routh 1 in a manner which leaves little to be 
desired. Determinantal expression of his conditions in a slightly modified form 
was made by Hurwitz 2 who apparently was unaware of Routh's work, and by 
Frazer and Duncan* who were unaware of the Hurwitz results. A brief outline 
of Routh's mode of attack will prove instructive in dealing with the problem 
at hand. 


2. Routhian analysis of sign of real parts of roots. Routh realized 
that the condition that all coefficients be positive—the leading coefficient having 
been made so—was necessary, but not sufficient unless all the roots were real. 
But a “derived” equation of degree n(n — l)/2 whose roots equal the sums of the 
roots of the original equation taken two at a time has real roots which are simple 
sums of the real parts of those of the original equation. In consequence, it is 
necessary and sufficient that the coefficients of the original and the “derived” 
equation all be positive. 

Thus, valid necessary and sufficient conditions are presented. However, they 
are disadvantageous from two points of view. First, they are not all independ¬ 
ent, being n(n + l)/2 conditions in number, whereas only n are necessary. Sec¬ 
ondly, despite several ingenious methods devised by Routh, it is not easy to 
compute them in the general case. 

Recognizing these difficulties, he therefore began anew from an entirely 
different angle. Utilizing a theorem of Cauchy concerning the relationship 
between the behavior of a polynomial on a closed contour in the complex domain 
and the number of roots within that closed curve, he derived necessary and suffi¬ 
cient conditions, which may be written in the slightly more convenient deter¬ 
minantal form of Hurwitz and Frazer and Duncan as follows: 


( 6 ) 


Pi Pi 

To = po > 0, T x « pi > 0, Tz > 0, 

Po Pa 

Pi Pi • • • Pit-i 

' pi p» Pi \ Po Pi ••• Pi.-t 

T s = po Po P* > 0, • • T. « 0 Pi •' * P*-» > 0. 

0 Pi Pi 0 P° 


0 0 

1 E. J. Routh, A Treatise on the Stalnlity of a Given State of Motion , (London, 1877), 
Chaps. 2 and 3; Advanced Rigid Dynamics, 6th ed., London, 1906, Chap. 6. 

* Hurwitz, Math. Ann Vol. 46 (1895), p. 521. 

* R. A. Frazer and W. J. Duncan, Royal Soc . Proc. t Series A, Vol. 124 (1929), p. 642. 
Also R. A. Frazer, W. J. Duncan, and A, R. Collar, Elementary Matrices , Cambridge Uni¬ 
versity Press, 1938, pp. 151-155. 
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The law of formation of these determinants is obvious. In the first row the 
odd p's starting with the first are listed. Within each column the p's diminish 
one unit at a time. Any p with negative subscript derived by this formula is 
treated as zero, and all p's of subscript higher than the degree of the equation 
are set equal to zero. With this convention, for po made positive, complete and 
independent necessary conditions are that all principal minors of T n formed by 
deleting successively the last row and column must be positive. These condi¬ 
tions are n in number and are independent. 


3. Complete, independent, necessary and sufficient conditions. Corre¬ 
sponding to South's first attack on the problem, we might consider an equation 
of degree n(n — l)/2 whose roots equal the products two at a time of the original 
equation's. If this equation and the original equation have real roots less than 
unity in absolute value, our problem is solved. This is guaranteed if, and only 
if, two further transformed equations with roots equal to the squares minus unity 
of the roots of the original and derived equations respectively all have positive 
coefficients. These conditions are necessary and sufficient, but not independent, 
and cannot be easily computed in the general case. Therefore, I follow Routh's 
example and approach the problem from a different point of view. 

When the roots of f(x) = 0 are plotted in the complex plane, they must all lie 
within the unit circle if their absolute values are to be less than unity, and con¬ 
versely. We might therefore attempt to apply Cauchy's theorem. However, 
it is not necessary to do so. Routh has shown what the conditions are that there 
be no roots in the right-hand half-plane. Can we find a complex transformation 
of variables which carries the unit circle into the left-hand half-plane? 

The answer is in the affirmative. The linear complex transformation 


(7) 


x 


Z + 1 
* - 1 ' 


z 


X + 1 
X - 1 


will accomplish this. But after substituting for x its value in terms of z , we 
cease to have a polynomial but rather a rational function of z as follows: 


( 8 ) 


fix) 



E p<(* +1 r'(* - d‘ 


We need only consider the polynomial in the numerator, i.e., 


(9) 


*(*) = 

o 


In order that the roots of the original equation he leas than unity , in absolute value, 
it is necessary and sufficient that the real parts of the roots of equation (9) he negative. 
Once we determine the coefficients (*■<) in terms of the original p’s, we can easily 
apply Routh’s theorems. This yields n + 1 necessary and sufficient conditions, 
all of which are independent. 
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Expanding the numerator of the right-hand side of <8) and ooUeotingi tertoa, 
the following explicit .formulas for the r’s are directly obtained: 

(10) : i « - CS P/ ^ _ # C<_*(-1)‘ jC», 

where 


* “ (t>-u>)!u>P 

and 

m(i, j) — the smaller of i and j. 

For fourth and higher degree equations literal substitution, while always 
possible, results in complicated expressions. It is preferable, therefore, to com¬ 
pute the t’s numerically and then apply the conditions of (6) directly. 

Other necessary conditions can be easily derived, but they will be dependent 
upon these. Thus, each x must be positive; but this is not, by itself, sufficient. 
Or, adding ir 0 and x„ we find 

(11) To + 1 Tb = Po + P2 + P4 + • • •• > 0, 

i.e., the sum of the even p’s must be positive. Similarly, still other linear sums 
of other t’s will result in cancellation of certain of the p’s. Except on special 
occasions there is probably no labor saved by utilizing conditions derived in 
this way. 

One obvious but useful necessary condition will be stated without proof. 
If one forms polynomials from subsets of the coefficients of a given “stable*’ 
polynomial formed by arbitrary “cuts” which leave adjacent coefficients in 
unchanged order and introduce no gaps within each set, then the resultiilg poly¬ 
nomials will all be stable. 

Special sufficiency conditions also can be developed. Carmichael 4 presents 
pertain inequalities between the absolute values of the largest root and the coeffi¬ 
cients of the original equation. For special problems these may be fruitfully 
applied. 

4. Exam ple. In conclusion I apply the conditions derived here to a well- 
known numerical equation determined statistically by Tinbergen* in the analysis 
of economic fluctuations. It is a fourth order difference equation with constant 
coefficients, , 

(12) Z t - .398Zt_i + .220 Z^ - .013Z,_, - .G27Z«_4 « 0 


4 R. D. Carmichael, Amer. Math. Soc. Bull., Vol. 24 (1018), pp. 286-206. 

* J. Tinbergen, Buxines* Cycle* in the United States, 1919-1933, League! of Nations, iO30, 
p. 140. 
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with the associated indicial equation 

(13) /(as) =■ x 4 - .398** + .220x ! - .013x - .027 - 0. 

Its roots have been computed and are known to be less than unity in absolute 
value. This may be verified by computing 


(14) 


To * 

0.782 > 0 

Ti * 

3.338 > 0 

Tj * 

5.398 > 0 

T| * 

4.878 > 0 

T 4 * 

1.604 > 0 


14.204 > 0 

ft- 

43.177 > 0 


To compute the same results by cross-multiplication the work is arranged as 
follows: 


(15) 


To 

"■* 

*r« 

.782 

5.398 

1.604 

Ti 



3.338, 

4.878 


TilTj — 7TqT3 

W 3 T 4 — 0 


14.204 

7.824 



Ts(lTlTS — ToTj) — irilTjir* 

43.177 


It may be remarked that the presence of a negative coefficient anywhere in 
the table is an immediate indication of instability, and that there is no necessity 
to continue the computation until a negative sign appears in a leading coefficient. 
This fact often saves much labor. 


VALDES OF MILLS' RATIO OF AREA TO BOUNDING ORDINATE AND OF 
THE NORMAL PROBABILITY INTEGRAL FOR LARGE VALUES 
OF THE ARGUMENT 

By Robert D. Gordon 
Scrippa Institution of Oceanography 

A pair of simple inequalities is proved which constitute upper and lower 
bounds for the ratio R t \ valid for x > 0. The writer has failed to encounter 
these inequalities in the literature, hence it seems worthwhile to present .them 
for whatever value they may have. 


1 J. P. Mills, "Table of ratio: area to bounding ordinate, for any portion of the normal 
curve.” Biometrika Vol. 18 (1026) pp. 305-400. Also Pearson’s tables, Part II, Table III. 



mills’ ratio 


m 


The function A, is defined by 
(1) R, = e* ,/s ^ e~ ,t,a dl. 

The following relations between R — R x and its derivatives are easily established 
by direct differentiations and substitutions: 


(2) 

qJ a- 

R 

H 

1 

H-* 


(3) 

d*R dR . D 

x 2 + 1 dR 1 

x dx x 9 

(4) 

dx* V x s + 1 j 

1 

+ M 

1—» 


Also by ordinary rules 

(5) * R x > 0, 

(6) lirn xR x = 1. 


1°. Suppose that at any point h > 0, Xj R > 1. Then by (2) dR/dx > 0, 
and R x would continue to increase with increasing x: still more, xR m would con¬ 
tinue to increase, hence we should have xR x > 1 for x § x\ , which contradicts 

(6) . Therefore we find xR x g 1 for x > 0, and 

(7) R x ^\, 


which establishes the required upper inequality. 

2°. Suppose that at any point x% > 0, d?R/dx 2 < 0. Then by (4) i?R/dx * »■ 
(d/dx)(d 1 R/dx t ) < 0 at this point. Since these derivatives are continuous this 
implies that for all x > x s , (fR/dx 1 < [d?R/dx 2 ) Mx < 0. Then we get the 
inequalities, for x > x» 


dR 

dx 


fdAl , , , f d 2 R~\ rdftl 

LsJ.+ ( * - **'Lsj’J. LsJ, 

R < R WI + Or - xt) \+ M* — *i) a 
Larrja 



where [ indicates evaluation at x — x 2 . Since [dtR/dx\ < 0, this implies 
that for sufficiently large x, R x < 0, which contradicts (5). It follows then 
that (3) is positive, and substitution of (2) gives 


x 

FTV 


( 8 ) 


R x £ 
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We combine (7) and (8) in the double inequality; 

(9) jnn sas i’ 

This gives for the probability integral the corresponding, inequality 


x*+ 1 


a -U‘ 

*v/ J* 


It can easily be shown (for x > 0) that equalities in (9)and (10) are impossible. 



DISTRIBUTION OF THE RATIO OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE VARIANCE 


By John von Neumann 


Institute for Advanced Study 1 

1. Introduction. Let *i ,•••,*, be variables representing n successive ob¬ 
servations in a population which obeys a distribution law 


-<«-{)*/«»» , 


".( c -^)’ 


i.e. which is normal, with the mean £ and the standard deviation a. For the 
sample we define as usual the mean, 


1 n 

% — - E x „» 


71 M «.i 

the variance, 

s 2 - \ 2 (*<. - *)*> 

71 fimml 

and also the mean square successive difference 

1 ^ 

i _ 

M~1 


S 2 = —E (*H4 - X,) 2 . 

71—1 u-1 


The reasons for the study of the distribution of the mean square successive 
difference 3 2 , in itself as well as in its relationship to the variance s 2 , have been 
set forth in a previous publication 2 , to which the reader is referred. The distribu¬ 
tion of 5 2 , and in particular its moments, were also studied there. The present 
paper is devoted to the investigation of the ratio 

a 2 


A comparison of the observed value of tj with that distribution is particularly 
suited as a basis of the judgment whether the observations Xi, • • • , x n are 
independent or whether a trend exists. (Cf. sections 1 and 2, loc. cit. 2 ) 

The moments of ij have already been determined by J. D. Williams by a 


1 Also Scientific Advisory Committee of the Ballistic Research Laboratory, Aberdeen 
Proving Ground. 

* John von Neumann, R. H. Kent, H. R. Beilinson, B. I. Hart, “The mean square suc¬ 
cessive difference/’ Annals of Math . Stat ., Vol. 12 (1941), pp. 153-162. 
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different method.* Williams’ results have been checked by W. J. Dixon at the 
suggestion of S. S. Wilks, whose stimulating interest has been largely responsible 

for the undertaking of the series of papers on $* and ^. The present rather 

o 

exhaustive discussion, however, brings out several other essential characteristics 
of this statistic, and provides the key to some very effective computational 
methods. It is further hoped that the reader will find that the mathematical 
methods used and the generalizations indicated have an interest of their own. 

From the latter point of view the final results of sections 5 and 7, concerning 
the distribution of values of quadratic and of Hermitian forms, may deserve 
special attention. 

2. Diagonalization of the quadratic forms and replacement by a spherical 
mean. Since 6 2 and e 2 are unchanged when we replace each by a: M — £, we 
may assume £ = 0. Then the distribution law of x is 

n 

ce -**/ 2 *a ^ and ^at 0 f Xi , . . . , x n is IX ce~** l2<r% dXp , 

M-l 


i.e. 


c n e M 1 dx i • • • dx n . 

Any linear orthogonal transformation of the X \, ■ • • , x n leaves £ and 

dx i • • • dx n unchanged, hence the above distribution law will likewise be left 
unchanged. Thus, we may subject the two quadratic forms £ 2 , to any simul¬ 
taneous linear, orthogonal transformation. 

Consider one carrying X\ , • • • , x n into, say x[ , • • • , x n , which brings the 

n 

quadratic form (n — 1)5 2 into the diagonal form, say -d„x* 2 . Such a trans- 

M-l 

formation does not affect the characteristic values of the quadratic forms 4 , and 

n 

these characteristic values are obviously A \, • • • , A n in the case of 2 

M - 1 

Consequently A \, • • • , A n are the characteristic values of the original quadratic 
form (n — 1)$ 2 . We shall determine them as such in the next’section. 

Clearly we always have (n — 1)$* 0, hence all A M £ 0. Some A M may 


* J. D. Williams, “Moments of the ratio of the mean square successive difference to the 
mean square difference in samples from a normal universe/' Annals of Math . Stat., Vol. 12 
(1041), pp. 239-241. Cf. also L. C. Young, “On randomness in orded sequences/ 1 Annals 
of Math. Stat.> Vol. 12 (1941), pp. 293-300. 

4 For the properties of matrices and quadratic forms cf. e.g.: J. H. M. Wedderbum, 
Lectures on Matrices, Amer . Math. Soc. Colloquium Publications , Vol. 17, New York, 1934. 
In the present context cf. mainly Chapters II and VI. 
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equal 0 say Jc (*■ 0, 1, • • • , n) of them, which we can arrange to W a«_*h , 
••• , A n - 

(n — 1)5* * 0 is thus equivalent to *{—••• — *,-* * 0, i.e. to n — k inde¬ 
pendent conditions. On the other hand this amounts obviously to Xi * • 
x n , and these are n — 1 independent conditions. So k = 1 and consequently 
A\ , • • • , A*_i > 0, A n — 0. And our linear orthogonal transformation must 
carry the ^-vectors with i, = ■ • • = x„ into the a;-vectors with x{ ■» 


= *»-i 


0. Among the former, 


—=., • • • , —= has the length 1; among 

v« v« 


the latter only 0, • • • , 0, =fc 1 have. Hence these correspond to each other: 
Now the scalar (inner) product of two vectors is an orthogonal invariant, that 

of a vector Xi , • • • , x„ with , • • • , is y/ni, that of a vector x [, • • • , x' n 

y/n y/n 

with 0, • • • , 0, ±1 is ±x' n , hence 

y/n£ = dtzx' n . 

n 

Put Tp = x + Up . Then clearly 2 = 0. Hence 

2^ xl = Tltf + 23 ul = Xn + ns 2 . 


M-l 




Owing to the orthogonality, the left-hand side is equal to 2Z therefore 




n—1 

2 V' '2 
ns ~ Z*, x n • 

Remembering that -4 n = 0, we also have 

(» - 1)5* = £ A,*; 2 . 

II—1 

Consequently 

tt — 1 * 

2 £ A„ x'u 

_ o _ n „-i 


*» “ 3 " 


s 8 n — 1 


£*;* 


, 2 ) n as in Xi, * * * f Xn > 


5.-1 

The distribution law is, as we know, the same in x [, 
namely 

- S *;•/*»• 

c"e * 1 dx[ • • • dx ' n . 

Thus x {, • • •, are independent, ij depends on x [, • • •, x,_i only, hence we 
may disregard x'„ altogether, and use the distribution law of the xi , • • • , a£_i, 
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With respect to x [, • • • , x*_i we may now state that the x[, • • • , a£_i dis¬ 
tribution of y can be obtained by determining first the distribution of y over 
every spherical surface 



and then averaging these distributions with the weights ^(r) dr, where ^(r) dr 
is the probability of the spherical shell from r to r + dr with respect to our 
original *(,•••, x«_i distribution law. (It happens to be c'e~ T,lu 'r n ~* dr, but 
this is immaterial.) 

Since the x[, • • ■ , a£_a distribution law is obviously spherically symmetric 
in these variables, the first-mentioned distributions over the spherical surfaces 
are readily obtained by assigning each piece of the surfaces in question its own 
relative, n — 2-dimensional area as weight. 

Since jjis a homogeneous function of x [, • • • , x n -i of order zero, these spherical 
surface distributions of y are the same for all r. Consequently we can replace 
all these r by, say r — 1, and the subsequent averaging over the r may be omitted 
altogether. 

Finally, since we restrict ourselves to r = 1, i.e. to the spherical surface 

n-1 

E *1 * 1. 

„-i 

the denominator of y may be omitted and we have 


1 


** n ~1 

7——T E Arf- 

n — 1 M ~i 


We sum up, writing again x t , • • • , x„_i for x '\, • • • , x»_i, then the desired 
distribution of y is that of 


y — 



£ 


where the point X\, • • • , x„_i is uniformly distributed over the spherical surface 


n —1 


1. 


Here Ai, • • • , 4»-i are all positive, and together with 0 they are the charac¬ 
teristic values of the quadratic form 

(n - 1)5* = £ (z M+ i - x„) ! 

M—1 


»—1 n—1 

x{ + 2 53 *2 + x\ - 2 2 ***<■+» ■ 
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3. The characteristic values ; first orientation concerning y. We have 
shown that there exist (counting multiplicities) precisely n — 1 positive roots 
A of the characteristic equation 


A - 1 1 

1 A — 2 1 

1 A- 2 1 


1 


1 

A -2 
1 


1 

A- 2 
1 


1 

A -1 


- 0 


(the empty places are filled with zeros), and that these roots are the 
A i, , A«—i. 

Such an A is characterized by the possibility of solving the equations 
(A — l)xi + Xt = 0, Xi + (A — 2)xt + *8 = 0, x» + (A — 2)x» + *4 = 0, 

* * * *., X n — 2 + (A 2)Xn—l + *n = 0, Xn—i + (A ” l)Xn = 0, 

in Xi , • • • , x„ not all equal to zero. Put 


Xo = *1 , X« +1 = X, , 


and 


A = 2 — 2 cos a, 

then these equations become 


x„_i + Xft+j = 2 cos a-x,, for it = 1, 2, • • • , » — 1, n. 


The last equation is satisfied by 

<* 

x,t = 2 cos (/li — £)a for p = 0, 1, 2, • • • , n — 1, n, n + 1. 

Now Xo = X\ is automatically fulfilled, while ovm = x n demands cos (n + §)« = 
cos (n — i)a. This is certainly the case when (n + = 2kv — (n — J)a 

kr 

Ck any integer), i.e. a = — . For no k = 1, • • • , n — 1 are x \, • • •, x n all equal 

n 


to zero ^indeed xj = 2 cos ^ > 0^, 
They are 


hence these k give A’s of the desired kind. 


A = 2 — 2 cos — «= 4 sin* (fc = 1, 1), 

n 2n 

and so they are all positive and different from each other. Their number is 
n — 1. Hence they are precisely Ai, • • •, A„_i. 
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So we have shown 


4 sin 2 (m - 1, • • •, n - 1). 


We can now reformulate the final result of the preceding section. 
Let us set 


n — 1 


(1 - •). 


W ——1 

p HW 2 

2^ cos — , 

M-l n 

where the point X \, • • • , x n ~\ is uniformly distributed over the spherical surface 

n—1 

Z *2 = i. 

Replacement of x M by x n carries « into — t. Therefore the distribution of 
€ is symmetric around 0. Hence the mean of c is 0. The maximum of «*s 

distribution is clearly cos ~ , its minimum is cos ——^ = —cos -. We state 

n n n 

these facts, together with their equivalents for r\. 

e (if)\s distribution is symmetric around its mean, which is 0 

maximum of t (ij)’s distribution is cos - 1 + cos - = cos s — ), 

n \n — 1L nj n — 1 2n/ 

... . . . it ( 2n P, jrl in . t ir \ 

»\» — 1 L nj n — 1 2 n/ 

Thus it will be easier to obtain information concerning ij by considering the 

distribution of «, since all odd moments of « are zero, etc. The investigation of 

e instead of ij was first suggested by B. I. Hart, who also found, that the first 

four odd moments of e vanish. R. H. Kent and B. I. Hart also determined the 

minima and maxima of these distributions for certain small values of n. 


4. Direct computation of the moments. We shall investigate the distribution 
law of a quantity 

7 = , 

where the point X \, • • • , x m is equidistributed over the spherical surface 

z*:~1. 


Our above e obtains by putting m = n — 1 and = cos ■ 
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We denote the mean of any function 

fa 1 , * • * , Xm) 

over the above-mentioned spherical surface (the x\, • • • , x m being equidistrib- 
uted over it) by 


f(xi, • • • , £**)• 

Our primary objective is to determine the moments of this distribution 


M, 



<P- 0,1, 2, ...)• 


Let us write 2 m for the (m — 1-dimensional) area of the above-mentioned 
spherical surface (of the unit sphere in m-dimensional Euclidean space). 

Now we form the function 


f(z) — f ••• f x *-dxi---dx m 

J — oo J— oo 


(This integral, as well as all others which we are going to derive from it, is ob¬ 
viously convergent, as long as z is sufficiently small. More precisely this is true 
when 


| z |-Max (| Bx |, , IU.I) S 1. 

We shall use them only in the neighborhood of z = 0.) Now clearly 

/(z)l = / f * • ■ f (^2 B^xl) e*dxi • • • dx^ 

J«—0 (JLoe J-oo V-1 / 

- ('••• ■■■ dx m 

J— 00 / 

= M p r 2p e^ > Z m r m - 1 dr 

- 2* jf" e- r ‘r 2,,+m - 1 dr 6 

~ e~ u u p+im - 1 dtt 

- *2 w M,r(p + |). 



# Introduce the new integration variable u — r*. 
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On the other hand 

f(z) = f • • • f e~<iSi o—*»*>** d Xl ... ^x m 
= fi r dx, 9 

> 1 —>1 J—oo 

= ft id - B M z)~ i -2 f e~ u u-*du 

M-l •'O 

= ft id - B„*)-*- 2 r(i) 

M-l 

- rCirw*)-*, 

where 


W -11(1- B„ 2 ). 

M —1 


Thus 


»*.«,r(p +1)-rQ‘{^*(,)-• 

For p = 0 this becomes, since Aio = 1, $(0) = 1, 


*—o 


1 

2 



Dividing the former equation by the latter gives, since 



In order to make a practical use of the above formula, we compute 
In OK*)" 1 ) - -i £ In (1 - B„z) 

M-l 



6 Introduce the new integration variable u ■* (1 — £ M *)r*. 
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Write 


then 


and so 


at 



*= 1 + Pit + Pit 3 + Pit* + • • •, 


Clearly 


Af» 


1 - 2 - 


m 

2 


(| + I )...(» + P _ I ) 


Pi — oil, 

Pt = «» 4 ^aj , 

Pa = a s + aia 2 + ia*, 

Pi — a< + |«2 + oios 4- ^ai«2 + • 

In our application (cf. above) 


Bm+1—p — , 

This has the consequence that 

«i — oi% — o# = • • • =0. 

Thus the t functions we compute contain only even powers of * and consequently 


Pi - Pi = Pt - • • • = 0, 

Mi = Ms = Mt = • • • = 0 , 


Pt = at, 

Pi = 04 + §«S , 

ft = + «s«4 4- Jai, 

ft = as + ia] + aunt + ^a\<u + -j^s. 


and 
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As mentioned before, we actually have m ■> n — 1 and * cos ^. Con¬ 
sequently 


at 


,«e*-J)(«r/n 7 


s5{-v} -*5 IW 

L g £ ( r j 

2 w# s W 

2‘ +1 lh\k) 

jl gw/g 

2'+H£'o\k)\£'o 


+ e 


J i 


<2*7»(Jfe—10/» 


0 <2r M (fc~ii)/n 


-}• 


n 


The inner sum has obviously these values 

= n if k — is divisible by n 
= 0 otherwise. 


n-1 

e i2T M (WO/n 

M-0 


Also 


Consequently 



2 *. 


<*z = 


n /i\ __ 1 

wr w 2“r 


where extends over those fc = 0, • • •, l } for which k — is divisible by n. 

k 

Let us now determine the k occurring in the following sum (as above, k — jl 
is divisible by n) 22'. k = jl is clearly one of them. All others are of the form 

h 

k = jl zk hn, h = 1, 2, • • • . The term contributed is the same for + and 
for —, since 

* w * 

jl + hn) \jl — hn 

So we have 





for l odd, 
for l even. 


r As pointed out above, we need to consider only the even l. 
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The number of terms which the sum contributes depends on the comparative 

sizes of Z and n. The number is clearly 

0 for $Z < n, 

1 for n ;£ JZ < 2n, 

2 for 2n g $Z < 3n, 


Explicit formulae follow: 8 


<*1 = «8 = = «9 = 


n — 2 

3n — 8 
“ 4 64” ’ 

6n - 16 
“* = 192 ’ 

35n - 128 


(0 for n = 1), 


(0 for n = 1, 2), 


^0 for n = 1, 2 ;§§ 4 » n “ 3 ), 

• ( 0forn = 1 ’ 2; 2®’ n = 3; il8’ n_4 )- 


0i = da — 06 = 07 = 09 = • • • = 0, 
„ 71 — 2 


n + 2n — 12 


b* + 12n 2 + 8n - 168 


« 4 + 28ti 8 + 21271* - 64n - 3696 
98304 


(0 for n = 1), 


(0 for n = 1, 2), 


^0 for*i = 1, 2; 
, ^0 for n = 1, 2; 


, ?t * 3^ , 


Mi = Mt — Mt = Mi = = • • • = 0 , 

^ 8 a _ n - 2 

’ (n — l)(n + 1) p * (»-l)(» + l)’ 


36 „ 36 > 

32768’ ” “ 3 ’2048’” " 4 > 


(0 for n * 1), 


* The author wishes to express his thanks to Miss B. I. Hart for her kind help in carrying 
out these computations. 
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Mi 
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(n — 1)(» + !)(« + 3)(» + 5) 


•P* 


3(w* + 2n - 12) 

(w — l)(fi -}- l)(n “|* 3)(w •+■ 5) * 

(0 for n — 1, 2), 


46080 


•A 


(n - l)(n + 1)(» + 3)(« + 5)(» + 7)(n + 9) 

_ 15(n* + 12W* + 8w - 168) _ 

(n — 1) (n + 1) (n + 3) (n + 5) (n + 7) (» + 9) ’ 

(0 for » 


2 ; n — 3 ). 


10321920 

(n - l)(n + 1)(» + 3)(» + 5)(» + 7)(» + 9)(» + ll)(n + 13) 

106(n 4 + 28n* + 212n 2 - 64n - 3696) 

(n - l)(n + l)(n + 3)(» + 5)(» + 7)(n + 9)(» + ll)(n + 18) ’ 

(0 for n — 1, 2; -g^^re, n = 3; si n — 4). 


We conclude this section by obtaining asymptotic formulae for the distribu¬ 
tion of e when n—* «e. 

In this case our formulae show that all a t (l even) behave asymptotically like 
constant multiples of n. It also appears from our formulae for the A (l even), 
that 


A = 7jjr-. as 1 + a polynomial in a*, a 4 , • • • , a ( _* of total order g \l — 1. 

V * 


Consequently -y— a*' is the dominant term in this expression, and so we have 

(ft) I 

asymptotically 


1 (n 


m\*)' 




From this 



Now the normal distribution 

Ci e~ v * ,3r i dy, (c 

with the mean 0 and the standard deviation <r x has the moments 
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This is dearly 0 for i odd, while for l even* 


.oKi+i) 


mi =* vi Ci*2 


2i«+» _i+i. 


C49- 


du 


•era r| 

For I = 0 this becomes, since m o = 1, 

l = 2Vir(i). 

Dividing the former equation by the latter gives, since 

\ 2 /_13 l-l 

r(i) 2 2 2 ’ 


Wl; = 1 >3 • • • (l — l)<Ti = 


n 


i 

i 


- JL ^ 

(»i\2/ • 
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2*W 

Comparing the formulae for M t and for mi shows that M t —' mi if = —, 

2 n 2 

<ri = . So we see: 

For n —► oo the distribution of c becomes asymptotically normal, with the 
mean 0 and the standard deviation <ri = * (The same result could be ob¬ 

tained by applying the general theorems of Liapounoff and others.) 


5. The distribution law, general discussion. We return to the quantity 7 , 
defined at the beginning of the preceding section, of which our c is a special case. 
We wjsh to obtain direct information concerning the distribution law of this 7 . 
Since a permutation of the jB m is permissible, we* arrange them such that 

Bi £ Bt £ • • • £ B„ 

(In the special case 7 = «, the By, — cos — are given in this arrangement.) 

n 

The distribution of 7 covers obviously the interval 

Bi y ^ B m . 

And if not B x = •••*= B m , i.e. if B x > B m , which we assume to be the case, 
then we have obviously a continuous distribution law for 7 in this interval. 
We denote it by u(y) dy. 


' Introduce the new integration variable u 


VL 
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Assume for the moment that B m > 0. Then the quantity 


Y_lm (£ B '*') ’ 


is bounded, and we can therefore form its mean value. This is the —— moment 
of y (cf. the beginning of the preceding section) 

J 2/ -hn w(y) dy. 

With any two a > b > 0 ^we shall have ^ « subsequently^ form the 

/ r / m \—im 

• • • j x l) dx 1 


quantity 


dXn 




-f 

Jb 


r m ~ l dr = 2, 


/' 

Jb 


dr 
b r 


= dis¬ 


consider next 
s(a, b) 


f ••• f x lJ dx i • • • dx m 11 


a, *Xh* l * b ' 


- { ■■■ [ \/u.B,dx,--dx, 


J(a, 6). 


M-l 


10 Concerning this transformation to polar coordinates and the quantity 2m cf. the first 
part of the preceding section. 

11 Replace each variable x M by \/^ x^ . 



DISTRIBUTION OF A RATIO 


881 


On the other hand t a comparison of their respective integration domains makes 
it clear that 

t(B m a, Bib ) £ a( a, b ) £ t{Bia, BJb). 

Thus 

2- In^5 S M_h. In | g S m In 


i.e. 


i to 5 _h l; 


, o , , Bi 

i ta E + ll ‘a; 


r ^ M—{ m =& 

l/n». ! °s l/ns. >»? 


Now let r —» oo, then 
o 


M_, m - 


n*. 


M-l 


obtains, i.e. 


7 -J m= [ Bl y -+* u(y)d y 

" Bt» 


\/iiB „ 

F n-1 


We now drop the assumption B m > 0. We consider instead a real number 
z with z < B m . Replace each J3 M by B m — 2 . Then the one with m = m be¬ 
comes > 0. And 7 is obviously replaced by 7 — 2 . Consequently our above 
equation is now valid in the form 


(7 - z) -1 ™ = f (y - z)~ im u(y)dy = 




Let now 2 be a complex variable. The second term of the above equation is 
a (locally) analytical function of e, except in the (real) interval Bi ^ 2 £ B m . 
The third term, too, is a (locally) analytical function of z, except at the (real) 
points Bi , • • • , B m . Consequently both are one-valued analytical functions 
of 2 in the simply connected domain which obtains from the complex 2 plane by 
exclusion of the (real) half line 

z <£ B m . 


Hence the equation 

(1) / (y ~ z)~ im w(y) dy 


1 



(*, “ «) 


> 



383 


JOHN VON NEUMANN 


which holds for all (real) z < B m , remains true for all complex t of the above 
domain. 12 

We observe next that u(y) is an analytical function of y in Bi £ y £ B m , 
whenever y j* Bi, ■ ■ ■ , B m . This is easily established by using any multiple 
integral expression for a>(y) which, while hard to evaluate explicitly, puts this 
analyticity into evidence. 12 


»»(y — z)~t m and the factors of 


i/n 

V M-i 


are those branches of these 


(*n - z) 


analytical functions which arc (real and) > 0 when z is (real and) < B M . When m is even 
(as it will be, cf. below) the domain of analyticity is somewhat more extended, but we need 
not discuss this. 

11 The computation which follows gives the desired analyticity in a simple way, and also 
makes it clear why the analyticity fails at y — Bi , • • • , B m . 

Consider the y ^ B\ , • • • , B m in B\ £ y £ B m . The probability of y ^ y is p{y) — 

<a(y) dy , and we may establish its analyticity instead of that of p'(y) — w(y). 

m 

m m 

Obviously p(y) is equally the probability of B^xl ^ y ^ xj, if the X\ , • • * , aw are 


/. 


M-l 


M-l 


equidistributed over a spherical surface “ r * 2 , an y K* ven r > 0. 

Our hypotheses concerning y imply B v > y > B v +1 for a suitable v ® 1, 
Consider now the expression 


, m — 1 . 


f(y) 


f ••• / e & xi dx t ••• d^. 


Transforming to polar coordinates, we obtain 


f(v) 


f e~ r ’-2 m p(y)r m ~ 1 dr 
Jo 

Sm f e- r 'r m ~ l dr-p(y). 
Jo 


(S» as before.) Hence it suffices to establish the analyticity of f(y). Now on the other 
hand 

f(y) - / •" / e~X xi d Xl ... dx„ 

1 ‘" / e ~^ 1 180-1,1 dw, • • • dw m . 

A " s 


(We introduced the new variables tfy «- V"| B M — y | a: M .) And this expression is clearly 
analytical in y, since B* > y > B t + 1 . 
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We shall need only the fact that w(y) possesses Jm continuous derivatives at 
these places, (m will be assumed to be even, cf. below.) Its behavior at 
y *■ Bi , • • • , B m will follow from our subsequent results in all cases where we 
need it. 

In order to determine «(y) from (1), as we now propose to do, it is very con¬ 
venient to assume that m is even. We therefore make this assumption, and 
shall maintain it throughout most of what follows. 

Consider a y 0 Si, • • • , B n in Bi £ y 0 £ £» . Then B v > y > B v+ i for 
a suitable v = 1 , • • • , m — 1 . Now put 

z = y o + it (t real and > 0 )> 

form ( 1 ), take the imaginary parts of both sides, and let t —► 0. 

Consider first the left-hand side of ( 1 ). Since w(y) possesses continuous 
derivatives at y — y 0 , we have 

fm—1 

0 )(y ) = Z 0k(y - y 0 )* + e(y)(y - yo) 1 " 

k-0 

with a bounded c(y). Clearly 

6k = k\ {dj k uiy) }y^ 

Thus, since u(y) is real, all 6 k are real and c(y) is also real. 

Compute now the contribution of each one of the + 1 terms in the above 
expression for <o(y) to the imaginary part of the left-hand side of ( 1 ). 

The last term, e(y) • (y — y 0 ) tm , gives 

3 f (y - 2/0 - it)~ im e(y)(y - y 0 ) in dy = 3 / ( — — — vj ) e(y) dy. 

J a m ■>B m \y — y 0 «/ 

The integrand is uniformly bounded, and so the reality conditions cause the 
entire expression to —» 0 for t — * 0 . Hence the contribution of this term is zero 
for t —* 0 . 

in m 

The other ~ terms correspond to fc * 0 , 1 , •••, — — 1 , the k term being 
2 2 


(y - 2/o - it)~* m -O k (y - yo) k -dy 

- "‘3 /. 


(y - yo)* 


b„ (y - yo - »<)*“ 

r*i Z (T) (»0*(y - yo - »<)*"* 


!=LW--p- dy 

< m (y - yo - *0*" 

= 0* Z ^ 3 j(*0* f s (y - yo - »<)*“*“*" dyj. 
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The exponent*; — h — ^ in the integral is always £ ^ — 1^ — 0 — 


m 
2 ' 


“ 1 , 


and it is — — 1 if and only if k = ^ — 1, 1 ■ fl, Consider first a term where 


this is not the case, i.e. where the exponent k — h — ^ < — 1 . For such a term 

2 

the expression 3 f {• > 

m) h 


becomes 

1 


T r W , , 

k h 2 + 1 


{(y - i/o - iO t_Wm+ 1 }E ' 1 


For t —> 0 the last factors are boimded and real, and so the entire expression —*0: 
for h — 0 because of the reality conditions, for h > 0 because of (il) h —► 0 . 

J71 

Thus only the term A; = - — 1,1 = 0 can contribute something else than zero 
for t —* 0 . 

Now this term is equal to 

{In (y-y 0 - 

and for t —*■ 0 this converges to 

0{m-iS(iir) = irStm-i = --—r- w(j/)l . m 

^ dy /*-*• 

Thus the imaginary part of the entire left-hand side of ( 1 ) converges for 
t —* 0 to this expression. 

The right-hand side of ( 1 ) is easier to discuss. The imaginary part under 
consideration is now 


3 


M-l 


I/o ^0 


~ 3 XX yo it) 

M-l 


Considering 12 (its y is our y o + it), this converges for t —► 0 to 

3 n (5 m - yo r 1 ft *'( 2 /o - 5„)-* - 3T— __L_ 

y III 5 m - yo | 


14 This evaluation {In ({/ — |/o — it) IjZai, is based on f > 0, and the fact that y moves 
on the real axis from Bm to Bi . It has no connection with 11 . 

14 The square roots of the (real and) > 0 quantities 

Bn — i/o (m ■* 1, • • •, v), i/o — Bft (ja m v + 1, • •., m), and | B» - y 0 \ 


are taken to be > 0. 
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If v (hence m — v) is even, then this is sero. If v (hence m — v) is odd, then 
this is equal to (— ; 1 — . Thus (1) becomes the following 




Vo | 


equation: 


T ( 'I = 0 

(W _ j\ | J»«*o _ ^»-l)_1_ 

>j/ft I yo"| 


if v is even, 
if v is odd. 


Simplifying this, and writing y for j/ 0 , and also restating the definition of v gives 

if v is even, 


d ‘ m_1 , " 0 


( 2 ) 


(;-iV 

_ \2 _/_ 


1 


vS 


if v is odd, 


B„ — y | 

B, > y> B,+i, 0 — 1, • • • , m — 1. 


Observe finally, that if we put 


m - n (2/ - b m ), 

M-l 

then this product has v factors < 0 (m * 1 , • • • , v), while the others are > 0. 
So 

m % 0 for • 


and in the latter case 


niBs-yl - -«(v). 

It is clear how we may now rewrite (2). 

We are now in a position to determine the behavior of u(y) at y = B \, • • • , B m 

too, since we know how its ^ — 1-th derivative behaves in the immediate 

vicinity of these places. (2) shows that it is singular there, and that the nature 
of the singularity depends on the number of the n, for which B„ is equal to the 
y in question, i.e. on the multiplicity of this root of our polynomial 8(y). 

In our actual application (to y = «, cf. the beginning of this section) the 
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B„ are pairwise different, i.e. all root multiplicities of 0(y) are equal to one. A 
further special case, which has a certain interest of its own, is when the B„ are 
equal two by two, but otherwise different, i.e. all root multiplicities of Sl(y) are 
equal to two. In the discussion which follows we shall therefore assume that 
one or the other of these two cases occurs. 
d* m-1 

In the first case u (y) has on each side of a y = B„ one of these two 
behaviors: It is identically zero, or it is singular, of the type ' 1 


it is at any rate integrable. Consequently 


d 


\m—2 




Thus 


w(y) is continuous on each 


dy im ~ 2 

side of y = B„, i.e. for both y = B„ ± 0. Successive integrations give now 

d!° Tfi 

that all — u(y), k = 0, 1, • • • , s — 2, are continuous for both y = B„ ± 0. 
ay* 2 

In the second case we have Bi = B 2 > B s = B 4 > • • • > B m _i = B m . So 

d \m-\ 

the v with B v > y > B v+i is necessarily even, and u{y) is identically zero 

d im-2 

for all of (2). Consequently ^ u(y) is again continuous on each side of 

y = B„ , i.e. for both y = B M ± 0. Successive integrations show again that all 

Tfl 

u(y), k = 0, 1, • • • , s — 2, are continuous for both y = B„ dc 0. 
air Z 

d k m 

We must therefore discuss only how much the o)(y), k = 0,1, • • • , ^ — 2, 

ay* Z 

change from y = — 0 to y = + 0. 

Let us return to the procedure by which we derived (2) from (1). We put 
again 


z = yo + it 


(l real and > 0) 


and let t —» «. But we consider now (1) itself (and not merely its imaginary 
part), and we choose a yo = B v . 

Consider first the left-hand side of (1), always disregarding terms which stay 


bounded for t —* 0. Then we can replace the integral f of (1) by any f 

J Bm Jb 

with any fixed a > 0, and this is equal to 


B (HI 


* 

/ + i 

*B v -o JjB 


B 9 +i l 
B »+0 


We choose this a > 0 so small that no B „ 5* B v lies between Bp — a and Bp + a. 
d* fti 

I.e. all 5 -. u{y), k * 0, 1, • • • , — 2, are continuous from B, — a to 

ay* l 

B, — 0 and also from B, + 0 to B, + a. 
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This being the case, we can evaluate the above sum of two integrals by ~~ — 1 

2 


successive partial integrations. Thus we get 

L-* (? - 2 - A1 d u 

- 1 V ( y-B , - .()-*"*« i 

(l_l)l <**• 








d k 




V-B,+0 

i /•*»+* jt>n-l 

+ 7--r- I (y-B, - it) 1 «(y) dy. 


In the first two lines the y = B v ± a terms are bounded for t —* 0, therefore 
only the y = B v ± 0 terms need be kept. Then the first two lines give 


t<»-s 

!C 

k"*0 


(ii 2 -0 

( I - 1 ) 1 


(-it) 


—Jm+l+fc 


dy k 


<»(y) 


V — B ,+0 

y 

V—B v -0 


up to terms which stay bounded for t —* 0. Consider now the third line. We 


d lm_1 


Cs 


know that the y-j —, w(y) in its integrand can be majorized by ~7==~ (for 
dy* m 1 V | y — B v | 

a suitable constant cj, cf. our discussion preceding the present one). Thus the 

integral in question is majorized by 

/ | y - B, - it \ c* 1 y - B v | dy, 

J B,—a 


hence a fortiori by 

[ | j/ - B, - it\~ l ci\y - B v \~*dy u 

*—<30 

-00 

= C j r 1 / | u — t| -1 | u\~*du 

- 


du 


V(u s + 1).|«| 


dv 


** Introduce the new integration variable u — 


vV + 1 

y_-By 


r». 


17 Introduce the new integration variable v — V| u \. 
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Since the last intonation is obviously finite, the entire expression is 0(C *) for 
t -► 0 . 

Consequently the left-hand side of (1) is equal to 








+ o(r*), 


for t —> 0. ( For £* « B\ or B m the j- h w(y) at y == B* + 0 or B v — 0, respec- 


tively, must obviously be taken to be zero.^ 


Consider now the right-hand side of (1). 

We first suppose the B„ are pairwise different. The right-hand side in ques¬ 
tion is — t^== = ■ === , i.e. 0(r‘). 
yU(B»-B v - it) 

Secondly let us consider Bi = B t > B t = J?« > • • • > 5 m _ x = B„. So we 
may assume = 2A = 1, • • • , ^. The right-hand side of (1) becomes now 

a rational function, —--. (The sign is determined by 12 .) So in our case 


(B ik - z) 


) t6. X—1 4m 

(Bik — Bt\ — it) II (Btk — B n ) • 1J (Bn — Bt, 


(-l)lm-X 


(-it)- 1 + 0 ( 1 ). 


Comparing these with our above expression gives therefore (for t —* 0) 


y (I - 2 






_ (-I)* -* _ 

II (Btk — Bn) • fl (Bn — Btk) 

4*1 


0(4“*) in the first case, 
(—it) -1 + 0(4"*) in the second case. 


In this formula the left-hand side is a polynomial in (—it)- 1 . Hence the 0(f*) 
terms on the right-hand side must vanish, and otherwise all powers of —it must 
have the same coefficient on both sides. Consequently 


(!->)' 
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must vanish, except in the second case for the one value of k with 


+ 1 + * - -1, i.e. k 


2. So, with this one exception, we have 


m {-r-jL «(v) 

b.+o {dy* ) v—m,-o ♦ 


And in the exceptional case (second case, v = 2A) 


• .Jm-2 


V2 ) JJ 


II (B n - Bn) II (Bn - B ik ) 

«1 k«X+l 


dr 7 ti 

Thus in the first case all derivatives — u(y), k = 0, 1, •••, — — 2, are con- 

dyr 2 


tinuous even at y = Bi , • • • , B m . 

In the second case the same is true for k = 0,1, 


— 3, but the deriva¬ 


tive with k = — — 2 behaves differently for y — Bi , • • • , B m . Indeed, for 

y = Bn -i * = 1, this derivative is continuous for both y = 

Bt\ ± 0, but it increases from B t \ — 0 to Bn + 0 by 




(Bik — B^k) II (Bsx — B2k) 

*-X+l 


( At y = B x + 0 and B m — 0 the — w{y) must be thought to continue with the 

\ ay* 

value zero.^ 

These rules, together with (2), determine w(y) completely. 

6 . First special case. We consider the first special case, where the are 
pairwise different. We immediately specialize further, to y = e, i.e. m = n — l f 

= cos — (m = 1, • • •, n — 1). (Cf. the beginning of the preceding section.) 
n 

Since m must be even, n must be odd. The rules of section 5 determine 

d!* Ti 1 

«(y); in particular all derivatives ^ u(y), k = 0,1, • • •, —--2, are every¬ 

where continuous, beginning and ending with zero at y =■ Bi mid B„_i, 
respectively. 

In the even intervals 


Bt *£ y s£ Bt , Bi <£; y ^ Bt , • • • , Bn-t ^ y ^ Bn-t , 
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the derivative ^ (n _ 1) _ 1 u(y) is aero, i.e. u(y) is a polynomial of degree 
|(n — 1) — 2. In the odd intervals 

Bi ^ y §£ Bt, B» S V ^ B4 , • • • , ^ y S B»-i, 

we have 

^Kn-l)-! (j[n -11-1)1 1 

(y) *■ 

(the sign ± is alternating ( —1)* ( "~ 1)-1 , ( —l) i(n-1,-s , • • ■ , +), where 


W 


-§(»—?)• 


Another expression for ?I(y) may be found by the following method, 
sin (rtfp) __ e inv> — c in * __ 1-2*0* 

m-o 


Clearly 


sin (f e'* — 

is a polynomial of cos <p = + <?“**) of degree n — 1, with the highest co¬ 

efficient 2 W_1 . For <p = — , /x = 1 , • • • , n — 1 , sin (n^) = 0, sin ip 0, hence 

n 

sin ( ri(p) 


sm ip 


, as a polynomial in cos <p, has the same roots as 51(2/). 31(2/) is a poly¬ 


nomial of degree n — 1 with the highest coefficient 1. Consequently 

Stas „) - -L “ <!»>. 

2"- 1 sin v 

This formula allows one to compute 91(2/) quickly, examples are 
» = 3: my) = y 2 - i, 
n = 5: n(y) = y 4 - \y 2 4- A, 

» = 7: rny) = y* - $y 4 + ! y 2 - 

The number of odd intervals, on which integrations must be carried out, 
is $(n — 1), but since those which are symmetric with respect to 0 require the 
same computations, only \(n — 1) or }(n + 1) must be considered. So there are 
1,1, 2, • • • such intervals for n = 3, 5, 7, • • • respectively. The integrals are 
first elementary (arcsin), then elliptic, then hyperelliptic. 

Numerical computations for n — 3 are immediate; for n = 5, 7 they have 
been carried out with considerable precision by B. I. Hart. 

At y = B „, : )_ t u (y) has a singularity of the type ^ | (cf. the end 

df 6 

of section 5), while all y-. u(y), k = 0, 1, • ■ • , J(n — 1) — 2, are continuous. 
ay* 
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At y *= Bi and Bn- 1 , in particular, they are zero. Hence it follows by successive 

d? 

integrations that the order of vanishing of ~ u(y), k — 0 ,1, • • • , $(« ~ 1) — 2 

dyr 

7b 

at y — Bi and Bn- i is (£(n — 1 ) — 1 ) — k — J = 5 . — 2 — k. In particular 
for k = 0 we find that at its maximum and at its minimum and J?„_i, 

i.e. dh cos ^ the order of vanishing of o>(y) is ^ — 2. 18 

Since w{y) has this property, and since it is obviously an even function of y , 
R. H. Kent has suggested approximating it by a series expansion of the form 


* / \$»-2+A 

(3) o)(y) = 2 a*(cos 2 - - y 2 ) 

Computations by B. I. Hart, not yet published, have shown that even the use 
of the first four terms (h = 0, 1, 2, 3, the a* being determined by the condition 
of normalization and by the first three even moments of the actual distribution 
given in section 4) give excellent approximations. The use of the formula (3) 
suggests itself likewise for even values of n. 


7 . Second special case. We consider now the second special case, where 
Bi = Bi > Bn — Ba > • • • > 5 m „i = B m . This has no immediate bearing on 
our original problem (cf. the preceding section), but we shall nevertheless discuss 
it for the two following reasons. First, it is hoped that the reader will find an in¬ 
dependent interest in the simple and complete results which can be obtained in 
this case. Second, there are various modifications of our original problem, which 
lead to this case. For example let the X\ , • • • , x n in our original problem, as 
described in section I, be complex numbers instead of real ones, replacing all 
squares by absolute value squares. Then one verifies easily that all character¬ 
istic values Ai, • • • , A n -i are doubled, and so our first case goes over into our 
second case. (This amounts to replacing our quadratic forms by Hermitian 
forms, cf. 4 ) It is easy to imagine two-dimensional problems where this set-up 
is natural. 


We put Cx = # 2 x-i = for X = 1, 


m 


, — , so that Ci > C 2 > 
2 


> c t „ 


are the only restrictions imposed. 

Every y in B t ^ y B m , i.e. in Ci St y S C{„ , lies in an interval C\ ^ y £ 

d ,m ~ 1 

C \+1 i.e. Bt\ 2 : y ^ Bn+i . That is the v of ( 2 ) is always even, and so |>t t u(y) 
is zero in every one of these intervals. Therefore u(y) is a polynomial of degree 

tit 

— — 2 in every one of these intervals. We have already shown that w(y) is 
2 


18 We omit the simple discussion of n * 3, which must be excluded from this result. 
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not the same polynomial in each interval. Thus w(y) is represented by - — 1 

2 

ffl TYl 

polynomials of degree - — 2 in the „ — 1 intervals 
« A 

Cl £ y £ C 2 , c 2 £ y £ C 8 , • * * , 

We could try to obtain explicit expressions for these polynomials by a direct 
application of the results at the close of section 5. A characterization of the 
distribution can, however, be obtained in a more elegant way by an indirect 
procedure. 

Consider an arbitrary function ftiy). We wish to express its mean 


%(.y) = f \5(y)<»(y) dy. 


If we can do this for all %(y) then the distribution is completely characterized. 

7TI 

We select first an - — 1-fold primitive function of ff(y), i.e. a function &(y) 
A 


i®<0> - 6(»). 


Of course ®(y) is determined only up to an additive polynomial of degree - — 2 

A 

in y. 

Now the above expectation value becomes 
- f Cl 

O-Z.A I u rtt /..\ / 


%(y) = [ dy 

J c im ay* m 1 

im-.l r C x -0 Am-\ 

= Z / is ®(vMy) dy. 

x-i •'cx+i+o ay* m 


d! 6 TYl 

Since all 3— w(y), k = 0,1, • • • , - — 2, are continuous from Cx+i + 0 to Cx — 0 
<* dy* 2 

Ud 

for all X = 1, • • • , - — 1, we can evaluate each integral of the above sum by 
A 

Ttt • • 

- — 1 successive partial integrations. Thus the following expression obtains: 
A 

fm—1 (im —2 j\m—k—2 jk "l Cx—0 

S 


, r Cl 


Considering the definition of ®{y) as an - — 1-fold primitive function, the 

2 

d k ' m 

—— ®(y), k' = 0,1, • • • , -jr — 2, are everywhere continuous. This corresponds 
dy k 2 
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to A' * ^ — k — 2, k «■ 0,1, • • • , ^ — 2. Hence the first line can be rewritten 
as 


( 


-IS'-’iSs-UK 


Cx+o 
lM?X-0 


For Cx =* Ci or Cj m the j-r u{y) at y * Ci + 0 or C* m — 0, respectively, must 
ay* 


obviously be taken to be zero.y Owing to the results of section 5 all terms 

• itx m 

with k = 0, 1, * • • , jr — 3 vanish, and the term with k = ~ — 2 gives 
2 2 


(- - 

J 1 l 

\2 > 

n (Cx - Cx) ft (Cx - C*) 

Ml MX+1 

> (™ _ i 

V 1 

\2 

' n (Cx - Cx) ft (Cx - Cx) 

Ml MX+1 




The second line vanishes, since 


d 1 ” -1 

dy im ~ l 


a >(y) is zero everywhere, as observed above. 


Finally 


MX+1 


For 


we have 


n <c* - Cx) ft (Cx — Ck) 

Ml 

»(z) - ft (z - C k ) 


®(Cx). 


Ml 


dz 


m) 


-ft 


*-Cx Ml 


(MX) (Cx ~ Ck) 


= (-1) X ~‘ n (C* - Cx) ft (Cx - C*). 

Ml MX+1 


Therefore the above formula can also be written 

®(Cx) 


S5(2/) 




dz 7 _c x 


Observe that the right-hand side of the above formula (which can also be 
easily expressed in terms of determinants) is a well-known approximate ex- 
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pression for 


d * m_1 


®(j/), as a (repeated) difference quotient of the values ®(Cx), 



mean of 


It is therefore very satisfactory that this expression gives the 


8(y) 


d* m ~ l 

dy**~ 


i &(y)- 


Appendix. We return to the normal distribution of x\, • • • , x„ as described 
in section 1 , and to the quantities «*, 5 2 , ?j given there. We denote means with 
respect to that distribution by (•••)• 

It was observed by B. I. Hart and mentioned by J. D. Williams* by corn¬ 
s' 

paring the known expressions for their moments, that every moment of 17 = — 

8 l 

is the quotient of the corresponding moments of 8 2 and of $ 2 . That is 



(p - 0, 1, 2, 


This indicates some kind of independence relation involving 8 2 and 8 2 . The 
considerations which follow are intended to clarify this situation. 

The above relation may be written 

„2P p v 

8 17 — 8 rj , 

or, more generally, 

rj p - 

We shall prove this by showing that s and rj are statistically independent. 

We can, as in section 2, make the mean $ = 0, i.e. obtain the X\ , • • • , x n 
distribution law 

c-e~X dx\ • • • dx n . 

And then, again as in section 2, perform a linear orthogonal transformation, 
carrying Xi , • • • , x n into, say x[ , • • • , x n which leaves the distribution law in 
its original form 

c n e~ x * lu% dx[ * • • dx'n , 


1 n-l 


fl pml 


n 


Z 


n-l v ' 1 „ 

Z 


and makes 
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Since x' n does not occur in «*, ij we must use only the x [, • •«, zUr distribu¬ 
tion law 

c”- 1 e”(.Si ’‘P 1 ***dx[ ••• dx' % -i . 

f 

Now we introduce polar coordinates with respect to x [, • • •, x B -i. These 
consist of a radius r with 

r 2 = Zx?, 

i—i 

and n — 2 angular variables <pi , • • • , ^„_s, which can be chosen in various 
ways, and which we need not describe more closely. At any rate 

dx[ • • • dx' n -i = r n-! drw(<pi , • • • , <p»- j) dpi • • • dp n -2 

where we need not determine the weight function w(<pi , • • •, p„_*). Conse¬ 
quently the distribution law is 

c n-l g -r»/ 2*» r *>-2 } f dlpj • • • dp„_j . 

Thus the coordinate r and the coordinates pi, • • • , p „_ 2 are independent of each 
other. 

Next 



and ij is a homogeneous function of x [, • • • , x„-i of degree zero, i.e. it is inde¬ 
pendent of r. So s is a function of r alone, and if is a function of pi, • • • , <p n -t 
alone. Consequently 8 and if likewise are independent. 

Added in proof: 

After this manuscript was completed, Dr. T. Koopmans informed the author 
of several results of his own, which he obtained in connection with other statistical 
investigations. They have many points of contact with this investigation, and 
will appear in the near future in the Annals of Mathematical Statistics. The 
author wishes to express his thanks to Dr. T. Koopmans for his communications. 



SOME EXAMPLES OF ASYMPTOTICALLY MOST POWERFUL TESTS 

By Abraham Wald 1 
Columbia University 

1. Introduction. In a previous paper 2 the author gave the definition of an 
asymptotically most powerful test and has shown that the commonly used tests, 
based on the maximum likelihood estimate, are asymptotically most powerful. 

In this paper some further examples of asymptotically most powerful tests 
will be given. Let us first restate the definition of an asymptotically most 
powerful test. Let f(x, 0) be the probability density of a variate x involving an 
unknown parameter 0. For testing the hypothesis 6 = 6o by means of n inde¬ 
pendent observations Xi , • • • , x„ on x we have to choose a region of rejection 
W» in the n-dimensional sample space. Denote by P(W„ | 0) the probability 
that the sample point E = fa , • • • , *») will fall in W„ under the assumption 
that 0 is the true value of the parameter. For any region f/» of the n-dimen¬ 
sional sample space denote by g(U n ) the greatest lower bound of P{U n \ B). 
For any pair of regions U» and T n denote by L(U„ , 7’„) the least upper bound of 

P(f/» | 0) - P(T„ | 0). 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

- Definition 1 : A sequence {W„) (n = 1 , 2 , • • • , ad inf.) of regions is said to be 
an asymptotically most powerful test of the hypothesis 0 = 9 n on the level of 
significance a if P(W B | 0o) = a and if for any sequence {Z n \ of regions for 
which P(Z„ | 0 o) = a the inequality 

lim sup L(Z», W B ) < 0 

n-+ao 

holds. 

Definition 2 : A sequence {W n } (n = 1 , 2, • • • , ad inf.) of regions is said to 
be an asymptotically most powerful unbiased test of the hypothesis 0 = Bo on 
the level of significance a if P(W n | So) = lim p(W B ) = a, and if for any sequence 

n-»oo 

fZ n ) of regions for which P(Z„ | 0o) = lim g(Z n ) * a, the inequality 

n—oo 

lim sup L(Z», Wn) 2 s 0 

n-*«o 

holds. 

1 Research under a grant-in-aid of the Carnegie Corporation of New York. 

' “Asymptotically most powerful tests of statistical hypotheses,” Annalt of Math. Stat. 
Vol. 12 (1941). 
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Consider the expression 

(1) VM = log/(x«, 0). 

yn «-i <™ 

Let IF, be the region defined by the inequality y,(0 o ) > c' n , W'i defined by the 
inequality p»(0o) < c" , and IF, defined by the inequality | y,(0o) | > c ,, where 
the constants c' n , c' % and c n are chosen such that 

P(W’ n | e 0 ) - P(W” | 0o) - P(W n | do) - «. 

It will be shown in this paper that under certain restrictions on the probability 
density/(x, 0) the sequence { W ' n } is an asymptotically most powerful test of the 
hypothesis 9 = 0 O if 6 takes only values > 6 0 . Similarly {IF”} is an asymptot¬ 
ically most powerful test if 6 takes only values < 6 0 . Finally {IF,} is an 
asymptotically most powerful unbiased test if 8 can take any real value. 

Another example of an asymptotically most powerful unbiased test of the 
hypothesis 0 = 0 O , as it will be shown, is the critical region of type A in the 
Neyman-Pearson theory of testing hypotheses. This fact gives a strong justifi¬ 
cation for the use of the critical region of type A. 

2. Assumptions on the density function. Let u be a subset of the real axis. 
Denote by 8* a real variable which takes only values in w and let 0 be a variable 
which can take any real value. For any function 4>{x) we denote by E,tp(x) the 
expected value of ip(x) under the assumption that 9 is the true value of the 
parameter, i.e. 

E,\l/(x) = ^(z)/(x, 0) dx. 

For any x , for any positive & and for any real value Q\ denote by <pi{x } Oi , 5) the 

greatest lower bound, and by <pt(x, 0,, S) the least upper bound of log/(z, 0) 

do 2 

in the interval 9i — 5 < 0 < 9\ + S. In all that follows the symbol 9 *, for 
any integer i, will denote a value of 8*, i.e., 8* is a point of ». 

We say that a value 0 lies in the eneighborhood of u if there exists a value 0* 
such that | 9 — 9* \ < t. 

Throughout the paper the following assumptions on /(x, 0) will be made: 
Assumption 1: For any “pair of sequences {0„) and {0*) (n = 1,2, • • • , ad inf.) 
for which 

Km E»n T- a log/(x, 0j) = 0 

n —00 00 

also 

lim (0 B — 0*) = 0. 
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is a bounded 

function of 0 and 0y, E$ — log f(x, 6i) is a continuous function of 0 and 0i and 

du 

log f(x, 01 ) J = d(6i) has a positive lower bound, where B\ can take any value 
in the e-neighborhood of w. 

Assumption 2: There exists a positive ko such that Ee^iix, 6 \, S) and 
E$ t <pi(x, Oi, 6 ) are uniformly continuous functions in the domain D defined as 
follows : the variables 61 and 6 % may take any value in the ko-neighborhood of <a and 6 
may take any value for which | & | < ko . Furthermore it is assumed that 

E, t [ Vi (x, 0 ! , «)]*, (t = 1 , 2) 

are bounded functions of 0 i, 0 2 and 6 in D. 

Assumption 3: There exists a positive ko such that 

C e)dx - C & ,{x ’ $)dx ~° 

for aU 0 in the ko-neighborhood of u. 

Assumption 3 means simply that we may differentiate with respect to 0 under 
the integral sign. In fact, 

f(x, 0) dx = 1, 

identically in d. Hence 

de /» ^ x ’ ^ dx = ^ X) 0 ^ dx = 

Differentiating under the integral sign we obtain the relations in Assumption 3. 
Assumption 4: There exists a positive ko and a positive rj such that 

is a bounded function of 0 in the ko-neighborhood of o>~ 

3. Some propositions. Proposition 1: To any positive 0 there exists a posi¬ 
tive y such that 

lim p{\. 11/»(0*) I > 7 I 0 ) = 1 

uniformly in 0* and for all 0 for which | 0 — 0* | > /9. 

Proof: From Assumption 1 it follows that Et ^ log/(x, 0*) has a positive 


E; 


00 


Furthermore there exists a positive e such that E { 


[is log K x > 0l) 
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lower bound in the domain | 8 — 8*\ > Since according to Assumption 1 
E t ^ log/(x, 0*)J is a bounded function of 6 and 0*, Proposition 1 easily follows. 
Proposition 2: There exists a positive t such that 
lim P[y n {0) < 1 1 0] = N(t \ 0) 

n-» oo 

uniformly in t and for all 6 in the e-neighborhood of w where 

(2) die) = - E» ~ log fix, 0) = E, log f(x, 0)J 
and 

(3) N(t | 0) = -j±= [ 1 e~ h,m) dv. 

V2ir d{6) 

Proposition 2 follows easily from Assumptions 3 and 4 and the general limit 
theorems. 

Proposition 3: There exists a positive e such that for any bounded sequence {n n } 
lim ip [ yM < t | 0 + -^1 - [‘ dN(v | 0)} = 0 

n-oo I, L VnJ ) 

uniformly in t and for all 6 in the e neighborhood of cu. 

Proof: We have 


(4) 




yn[e + 


Mn 


= yM + -^‘-4= HI log f(x a ,0' n ) 

4- _ 

y/n- 


y/n) 9 y/n y/n*** d 0 2 

where d' n lies in the interval Ir+sJ- From Assumption 2 and the above 


equation we easily obtain 

'[*( , + S) 


(5) 


lim s P | 

n««oo 


0 + ) < 1 1 0 + M 


-r] 


y/n) y/n- 

- p[yn«0 - Pnd(0) <t\0 + - 0 

uniformly in t and for all 0 in the ^neighborhood of w. From Proposition 2 
and (5) we get 

lim dN(v 1 0)-P £y,(0) < t + M „d( 0 ) 0 + - 0 


or 


lim dN(v 1 0) - P^ n («) <t\0 + - 0 


( 6 ) 
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uniformly in t and for all 0 in the (-neighbourhood of o>. This proves Propo¬ 
sition 3. 

Proposition 4: There exists a positive t such that for any positive y and for 
any sequence {/u»} for which lim | I = 00 

lim p{\ y n (0*) | > y \ 0* + - 1 

n« l, y/n) 

uniformly in 6 *. 

Phoop: If there exists a positive 0 such that > & for almost all n, 

y/n 

Proposition 4 follows from Proposition 1 . Hence we have to consider only the 
case lim = 0 . Since 

n-oo y/n 



we get from (4) 

L Liogf(x a , e' n ) 

( 7 ) E t . Mll jy/- n )[yn(e*)] + m »#**+ o ../ v '»> —--- = 0 - 

Mn 

Since lim — 7 = = 0 , we have on account of Assumption 2 
V n 

H O ,2 

lim Ef+^y/Z) ----= E,.— log /(x, 0*) 

hm n ocr 

log fix, o*)J = -d(e*) 

uniformly in 6 *. According to Assumption 1 d($*) has a positive lower bound; 
hence on account of lim | Mn | == 00 we obtain from (7) 

( 8 ) lim | Ee^jyTn) 2/n(0*) | = » 

d 

uniformly in 6*. The variance of y n (9*) is equal to the variance of — log/(x, 6*). 

do 

d 

On account of Assumption 1 the variance of — log/(x, 6*) (under the assumption 

O0 

Mn 

that 0 * + is the true value of the parameter) is a bounded function. Hence 
y/ n 

Proposition 4 is proved on account of ( 8 ). 

Proposition 5: Let {W n (6*)\ be a sequence of regions of site a, i.e. 
P[W n (0*) | 6*] = a, and let F„(0*, y) be the region defined by the inequality 
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Vn(8*) < V • Let U»(6*, y) be the intersection of V*(6*, y) and W n (0*) and denote 
P[U»(8*, y) | 0*] by F n (y | 0*). Denote furthermore p£lP«(0*) | 6*- + by 
0(6*, y, «). If {O and {#*»} are two sequences such that lim d(0*) «■ d; 
lim F n (y | 0*) = F(y) and lim #*» « y then 




dP(y). 


lim Q(6t, itn, n) = f e* 

n—«o *-«o 

Proof: Let lim n n = m and consider the Taylor expansion 

Z log/(x„, 0* + = Z log/(x«, »*) + ^ Z ^ log/(«<», «*) 


( 9 ) 


where 0* n lies in the interval 


From this we easily get on account 




of Assumption 2 and the fact that |/i„} is bounded 

( 10 ) 


n f(x a , 0* + 7-) 

log n - v - a *: n/ = >tnVn(e*) - btl d(e*) + t(0*, n) 

a —1 


f(x a , e*) 

where for arbitrary positive v 


(ID 


lim P 


{|«(**, 


Itn 


n)\<v\6* + ^-\ = l 


uniformly in 0*. Denote by R n (0*) the region defined by 

(12) ' | t(0*, n) | < ij > 0. 

On account of (11) we have 

(13) lim P [p„(0*) 1 6* + = 1, 

uniformly in 6*. Denote the intersection of R n (6*) and F'»(ff ,| ‘) by Q n (0*), and 
the intersection of R„(0*) and U n (6*, y) by T n (0*, y). Furthermore denote 
P[T n (0*, y) | 6*) by P n (y \ 0*). Then we have 


(14) 


• dP n (y\e*)<P [r,(e* t)\o* + 

< e" jf* e^'-W'dPniyle*) 
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for all values of t and 0*. Furthermore we obviously have 

(15) lim j(?(0*, M «, n) -P £q„(0*) | 0* + = 0 

uniformly in 6*, and 

(16) lim [Pn(t I 0*) - F n (t | 6*)] - 0 


uniformly in 8 * and t. Since 17 may be chosen arbitrarily small, it follows from 
(14) and (15) that to any e > 0, rj may be chosen such that 


(17) 


lim sup 


G(8*„, h„ , n) - f + °° dP n (t 1 0* n ) 


for any sequence { 0 *}. 

To each e let L, be a positive number such that L, depends only on e and 


(18) 


£ L ‘ dN(t \ 6*) + j°° dN(t 1 0*) < ? 


for all n and for all values of 0*. Since d( 0 *) has a positive lower and a finite 
upper bound, it is easy to verify that such a L, exists. From (18) and Proposi¬ 
tion 3 it follows 


(19) 


lim sup \P\y n {9*) < -L«| 0 ! + 

L VnJ 


' % 


+ P| J/n(0 > L, | 0* + ^ 


for any arbitrary sequence {0* I. Since the difference U n {0*, k) — l ’,.(0*. k) is 
a subset of the difference F„( 0 *, U) — V n (d*, <i) and since T n (0*, h) — T n (0*, U) 
is a subset of U„(6*, U) — U n (9*, h) for h > t\, we get from (18) and (19) 


( 20 ) 


lim sup \p U n {0*, —L,) | 0 * 

n-+oo ^ L 


* ^ Mn 


\/n 


1 + p|Vn(O|0! + -^7= 

J L V«J 


- p[c/„(0:,l.)|0; + \ 

limsup |p[r n ( 0 ;, —L«) I 0* + + p[q b ( 0 !) 1 0*n + 

for any sequence {0*(- On account of (14) we get from (21) 

( 22 ) e~’ lim sup / [ *' dP n (t | 0*) + [" dP n (t | «!)) < e -. 

n-*to J 2 


and 


( 21 ) 
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From (17) and (22) we obtain 


(23) lim sup CKti, M «, ») - [ L ‘ dP n (t | ^) | < « 

for any sequence { 0 ^}. Consider now the sequence { 0 *) which satisfies the 
conditions of Proposition 5. Since F n (t | 8*) converges to F(t) uniformly in i, 
on account of (16) also P n (t | 0 „) converges to F(t) uniformly in t. Hence we 
obtain from (23) 


(24) 


lim sup G(dt, nn, 



dF{f) 



Since t and y may be chosen arbitrarily small, Proposition 5 follows from (24). 


4. Some theorems and corollaries. Theorem 1. Denote by S n {8*) the region 
defined by the inequality y n {8*) > A n (6*) where A n ( 0 *) is chosen such that 
P\S n (6*) | 0*] — a. For any region TT n ( 0 *) denote by L B [lFn(S*)] the least upper 
bound of P[W n {8*) | 8] — P\S n (8*) | 8] with respect to 0* and 6, where 6 is restricted 
to values > 8*. Then for any sequence j W n (0 *)} for which P[TP„( 0 *) | 5*] = a, 

lim sup L„[F„( 0 *)] < 0 . 

n —* oo 

Proof: Assume that Theorem 1 is not true. Then there exists a sequence 
of integers {»'}, a sequence | &*•} and a sequence | 0 B '} ( 0 B - > 0*0 such that 

(25) lim {P[TP„.( 0 ;O 1 0 „-] - P[S B -( 0 B *O | 0 B ']} = 5 > 0 . 

n—oo 

On account of Proposition 2 and Assumption 2 the sequence \A n ’{8* 0) is 
bounded. Then it follows easily from (25) and Proposition 4 (taking in account 

that E, ^ log f(x, 0 *) > 0 for 0 > 0 * 
ou 

'(26)' (0i,' - elWn' = > 0 

must be bounded. Denote by |n"} a subsequence of \n' ) such that 

(27) lim d(0*>>) — d 

(28) lim y n > > = n, and 

(29) lim F„"(< | 8*..) = F(t) 
uniformly in t where 

F n (t | 0*) = P[Un(8*, t ) I 0 *] 

and U n (0*, <) is the intersection of W n (6*) and the region y B ( 0 *) < t. The exist¬ 
ence of a subsequence {n"\ such that (29) holds follows from the fact that 

(30) Fn(k | 0 *) - Fntti | 0*) < $„(<* | 0 *) - $n(<i I 0 *) for tt > < 1 , 
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and 

(31) lim $„(< | 0j»«) ■» _ f e~ itt,d dv = N(t), 

n—oo *\/ 2 ird 

where #„(< | 8*) denotes the probability P\y n (8*) < i | 0*]. Furthermore it can 
easily be shown that 

(32) f + dF(t ) - a. 

X-oo 

On account of Proposition 5 we get from (25), (27), (28), (29), (30) and (31) 

(33) dF(t) - f~ d“~ u ' d dN{t) = 5, 
where A denotes a value such that 

r dN(t) - a. 

It has been shown in a previous paper* that (33) leads to a contradiction. Hence 
Theorem 1 is proved. 

Theorem 2: Denote by S H (0 *) the region defined by the inequality y n (8*) < 
A n {6*) where d„(0*) is chosen such that P[S n (8*) \ 8*\ — a. For any region 
W n (6*) denote by L n [W K (8*)] the least upper bound of 

P[W n {8*) | B) - P[S n (B*) | B] 

with respect to 6* and 6, where 0 is restricted to values < 6*. Then for any sequence 
' (IF„( 0 *)) for which P[W n (6*) | B*) = a, 

lim sup L n [W n (B*)] < 0. 

n —►« 

The proof is omitted, since it is analogous to that of Theorem 1. * 

Theorem 3: Let {TT»(0*)) be for each 0 * a sequence of regions for which 
P[W n {0*) | 0 *] = a and lim 0 [TF„( 0 *)] = a uniformly in 0 *. Denote by L n [W n {0*)) 

n—oo 

the least upper bound of 

P[JP«(0*) I 0} - P [| y n {6*) | > A n (B*) | 8]- 
with respect to 6 and 8*, where A n (6*) is chosen such that 
P[| l/»(0 I > A n {6*) I 8*} - a. 

Then 


lim sup L„[IF n (6*)] < 0. 


* See p. 12 of the paper oited in *. 
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Proof: Denote < 1 | 9*] by «!>«(< | 9*) and denote by F+{t | 9*) the 

probability (under the hypothesis 0 — 0*) of the intersection of TP«(0*) with 
the region < t. Assume that Theorem 3 is not true. Then there exists 
a subsequence {»"}, a sequence {$£/<} and a sequence such that 

limd(flj/') = d; lim ( 9 n •> — ot")y/n" — limin'' = p; 

«■■ oo 

lim Fn"(l | ot") = F(t) 

uniformly in t, and 

(34) f + " dF(t) - f A dN(t) - r^dNit) = i 

J—oo J— oo J A 


where A is a positive number such that 



N(t) = -J= [‘ e~ it,,d dv. 
v2t d •*-» 


This can be proved in the same way as (33) has been proved. The author has 
shown in a previous paper 4 5 that (34) leads to a contradiction. Hence Theorem 
3 is proved. 

Theorem 4: Denote by A n (0*) the region of type 6 A of size a for testing the hy¬ 
pothesis 6 = 6*. Denote by B„(9*) the region | y n (6*) \ > C„( 0 *) where C n (6*) 
is determined such that 


P {| yn(6*) \ > C n (0*) I 0*] - a. 


Then, under the assumption that E« 


[^, lo e/(x, **)] 


is bounded, 


lim {P[A n (0*) | 0 ] - P[B n (9*) | 0]} - 0 


uniformly in 9 and 9*. 

Proof: The region A n (0*) is given by the inequality 6 

®» 3 . r a 1 

+ E log A*. » 6 *) > *«(»*) [E 1 log fix* . **) J + *•<•*), 

where k'„(9*) and k"(9*) are chosen such that A„(0*) should be unbiased and of 
size a. The inequality (35) can be written also in the form 

(36) [y n (9*)f + - E £ log fix*, 9*) > In (e*)y n (9*) + C(0*). 

n a ov* 


4 See p. 14 of the paper cited in *. 

5 Neyman, J. and Pearson, E. S., “Contributions to the theory of testing statistical 
hypotheses,” Stat. Rea. Mem., Vol. 1. 

• See the paper cited in 
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Let {/*„} be a bounded sequence. From Assumption 2 it follows that for any 
positive c 

(37) p{ l Z log fix., 6*) + die*) < € I e* + - 1 

uniformly in $*. Since (37) holds for arbitrarily small e, we get easily on ac¬ 
count of Proposition 3 

(38) lim jp [a b (o*) | 6* + i-’k J - P ^1(0*) | 0* + ^k]j = 0 

uniformly in 6*, where A' n (6*) is defined by 

(39) [j/«(0 *)] 8 > i’n(e*)y n (e*) + die*) + d(e*). 

Since A n (0*) is unbiased and of size a, we have on account of (38) and (39) 

(40) lim l'n(6*) = 0 and 

(41) lim l"{6*) + did*) = X(0*) > 0 


uniformly in 6*, where X(0*) is given by the condition 

(42) 1 , pn 

\/ 2irdi6*) •*-v'x(»«j 

Inequality (39) is obviously equivalent to the simultaneous inequalities: 

yjo*) < die*) and y»ie*) > c"ie*) 
where c' n (0*) and did*) are the roots of the equation in y„(0*) 

[y n ie*)f = l'niO*)y n( 0 *) + Cie*) + die*). 


lim cl (0*) = - \/x(0*) and lim die*) = + \/x(0*) 
uniformly in 0 *, from Proposition 3 it follows that 

lim |p £a„( 0 *) | 6* + 




dN(t 1 6*)- [ dNit 10*) \ = 0 


uniformly in 0 *. 


Now let us consider a sequence {r»} such that lim | v n | = » and lim —£=. = 0. 

Vn 
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We shall prove that 

(44) P [ 4 ” (e * )|e * + ^] “ 1 

uniformly in 0*. Since log fix, 0*)J is assumed to be bounded, 

(45) 

^*+(*n/V»> [^2 fl *)] 

and 

(46) W^)[| s logM^)J 

are bounded functions of 8* and n. We get by Taylor expansion 

? I# l0 * /( *-' **> ■ ? Js'°*'(*•• ** + Vi) 

(47) 2 

where 0 ^ lies in 19*, 0 * + . Hence 

L vnJ r 

(48) ^+(vv^)[^(^)] “ -*» Ee*+(pJV^) 2 lo 8/(*«» flU) J • 

From Assumption 2 and lim | v n | — oo it follows that the absolute value of 
the right hand side of (48) converges to oo. Hence 

lim | iW„/V"[ 2 /n( 0 *)] I = 

Since on account of Assumption 1 

£30 0*)J 

is a bounded function of n and 0 *, also the variance of y n (0*) (under the assump¬ 
tion that 0 = 0 * + Vnjy/n is the true value of the parameter) is a bounded 
function of n and 0 *. Hence for any arbitrary large constant C 

(49) lim p|j 2 /»(0*) \ > C \ 8* + = 1, 

uniformly in 8*. The equation (44) follows easily from (36), (40), (41), (45), 
(46) and (49). 
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Consider a sequence {p„} such that > 0 > 0 for all n. Then it follows 
easily from Proposition 1 that for any arbitrary C 

(50) limpjj y n (8*) | > C 1 8* + 

uniformly in 8*. Since 2J«|^ 2 log f(x„ , 0*)J is assumed to be bounded, and 

d * 

therefore also E> — log f(x, 8 *) is bounded, there exists a finite g such that 


(51) 


,im £ l0g/<1 -11 < # 11 ** + vl| = 1 


uniformly in 8*. From (36), (40), (41), (50) and (51) it follows 


(52) 


lim P A n (6*) \6* + - 


uniformly in 6 *. Since on account of Propositions 3 and 4, the relations (43), 
(44) and (52) hold if we substitute B n (8*) for -4„(0*), Theorem 4 is proved. 

If Assumptions 1-4 are fulfilled for the set « consisting of the single point 
0 = do , then we get from Theorems 1-4 the following corollaries: 

Corollary 1: Let W' n be the region defined by the inequality y n (8a) > c» , 
Wn defined by the inequality y n (8o ) < c” , and W n defined by the inequality 
12/»(0o) | > c„ , where the constants cl , c" and c„ are chosen such that 

P{W' n | do) = P(W" | e 0 ) = P(W n | 0o) = «• 

Then { W ' n } is an asymptotically most powerful test of the hypothesis 0 = do if 8 
takes only values > do . Similarly [W'n] is an asymptotically most powerful test 
if 8 takes only values < 0 O . Finally {W n } is an asymptotically most powerful 
unbiased test if 8 can take any real value. 

Corollary 2: The sequence {A„(0 O )} is an asymptotically most powerful un¬ 
biased test of the hypothesis 8 = 0 O , where A „(8 0 ) denotes the critical region of 
type A for testing 8 = do. 



ON THE DISTRIBUTION OF THE QUOTIENT OF TWO CHANCE 

VARIABLES 

ByJ. H. Curtiss 

Cornell University 

1. Introduction. Although the quotient of two chance variables appears fre¬ 
quently in mathematical statistics, the methods used in the literature to derive 
the distributions of quotients have usually been special ones devised for the 
particular variables under consideration, and in no way indicative of the general 
result. It is the purpose of this paper to study the distribution of the quotient 
of two variables for itself alone, with attention first to the question of existence, 
and then to the accurate derivation of a number of general formulas for the 
frequency function and d.f. 1 The principal formulas which we shall derive may 
be described briefly as follows (the numerals refer to the equation numbers in 
the text): 

(3.1) . The frequency function of the quotient of two variables which have an 
absolutely continuous joint probability function. 

(4.11), (4.12). The d.f. of the quotient of a pair of arbitrary independent 
variables, expressed in terms of the d.f.'s of these variables. 

(5.2) . The d.f. of the quotient of a pair of arbitrary independent variables, 
expressed in terms of the c.f.’s 2 of these variables. 

(6.4). The limiting form of the d.f. of a quotient of two sums of arbitrary 
identical independent variables. 

(7.1) . A formula analogous to (3.1) for the product of two chance variables. 

(7.2) . A formula analogous to (4.11) for the product of two chance variables. 

2. The existence of the quotient distribution. The function Z = X/Y is a 
continuous function of X and F, finite and uniquely defined for all points 
(X, Y) such that Y * 0. Therefore if P{ Y - 0} - 0, the pr.f. 8 P(S) of the 
joint distribution of X and Y determines a probability distribution for Z (see 
[1, pp. 12-13]). To avoid irrelevant difficulties, we shall assume in the sequel 
that P{Y = 0} * 0 unless definite statement is made to the contrary. This 
assumption involves no real restriction on our work, for in situations in which, 
a priori, the assumption is not fulfilled, we can always replace the distribution 

1 1.e., distribution function. The underlying axioms, terminology, and abbreviations 
in this paper are uniform with those of Cram6r’s book [1]. For the definition of d.f., see 
II. P- Ill- 

* I.e. y characteristic functions. See [I, p. 23]. 

* I.e., probability function; [1, o. 9]. 
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of Y by the conditional distribution of Y relative to the hypothesis that Y -A 0. 
In such cases, then, the distribution of Z which we are about to study is to be 
interpreted as a conditional distribution relative to this hypothesis. 

We shall suppose that the space of X is the z-axis, that of Y, the y-axis, and 
that of Z, the 2 -axis. It is quite readily seen that the set of points in the (x, y) 
plane which corresponds to the set Z £ z consists of 

(i) the infinite region 4 in the upper half-plane which is bounded by the negar 
tive x axis and by the line x = zy; 

(ii) the infinite region in the lower half-plane bounded by the positive z-axis 
and the line x — zy; 

(iii) the line x = zy except for the origin. 

Denoting this set by S, , we have ' 

H(z) = f dP(S) - PCS,), 

where H(z) is the d.f. of Z. The present paper, from the viewpoint of analysis, 
is simply a study of the Lebesgue-Stieltjes integral appearing in this equation. 


3. The continuous case. Suppose first that P(S) is absolutely continuous. 
This means that the joint distribution of X and Y has a frequency function 
<p(x y y)> which is defined almost everywhere, is non-negative, and has the prop¬ 
erty that P(S ) = / <p(x y y) dx dy. In general, this integral must be taken in 

* j 8 

the Lebesgue sense, but of course if the discontinuities of v form a set of two- 
dimensional measure zero, and if the Jordan content of any bounded portion of 
the boundary of S is zero, then this integral is just an ordinary improper double 
Riemann integral. 6 In particular, these conditions are fulfilled if <p is continuous 
everywhere and if 8 = S z . 

The transformation x = uv y y = v, gives a continuous one-to-one map of S, 
onto a set of the (u, v) plane which consists of the closed half-plane lying to 
the left of the line u = z, but with the u -axis deleted. The Jacobian of the 
transformation has the absolute value | v |. By the theorem for change of 
variables in Lebesgue integrals [4, pp. 653-655], we have 

Hiz) = / <p(x y y)dxdy = / \ v j <p(uv , y) du.dv. 

" Bg 


By Fubini's Theorem [6, pp. 203-208], the last integral can be expressed as a 
repeated integral. Integrating first with respect to v , we obtain this result 
Theorem 3.1: If the joint variable (X, Y) has the frequency function <p(x , y ), 
then 



4 1.e., open connected set. 

8 See [4, pp. 476-478: p.575]. 
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and consequently H(z) is an absolutely continuous function of z. The frequency 
function of the distribution of Z exists almost everywhere, and is given by the for¬ 
mula 

(3.1) h(z) =*F'(z) = £ | v | <p(zv, v) dv. 

We remark that if X and Y are independent, so that <p(x, y) = f(x)-g(y), 
where / and g are respectively the frequency functions of X and Y, then (3.1) 
may be written in the form 

(3.2) h(z) = £* | v | f(zv)g(v) dv. 

This case was considered recently by Huntington [5], with the additional restric¬ 
tions that g(y) = 0 , y < 0 , and that f{x) and g{y) be continuous. 

All the familiar special quotient distributions of applied mathematical sta¬ 
tistics, such as Student’s t and Fisher’s z, may conveniently and rigorously be 
derived by means of (3.1) and (3.2); in each case the required result follows 
immediately after an obvious change of variables in the integrand. We pause 
here only to point out explicitly the result obtained when X and Y have a normal 
joint distribution with variances c \, <4 , and correlation coefficient p. If the 
means E(X) and E(Y) are not equal to zero, it is apparently impossible to 
evaluate (3.1) in closed form; this case has been studied in some detail by 
Geary [3] and by Fieller [ 2 ]. But if E(X) = E(Y) ~ 0, then 

*m - ’ssxnj —; 

«’,(*-,S) +4<i-p’) 


which is the frequency function of a Cauchy distribution with mode at the 
point z = p<r x /<T Y , the value of the regression coefficient of X on Y. If X and Y 
are independent, then p = 0, and the frequency function becomes 


(3.3) 


h(z) 


<?X Or 
IT 


1 


1 J , 2 ‘ 

(Ty 2 “r <Tx 


4 . The quotient of two arbitrary independent variables. We shall hence¬ 
forth drop the restriction that P(S) be absolutely continuous, but shall suppose 
instead that X and Y are independent chance variables with one-dimensional 
distributions of the most general type, except that the distribution of Y will be 
subject to the restriction that P{Y = 0) =0. 

We denote the d.f. of X by F(x), that of Y by G(y), and, as usual, that of Z 
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by H(z). It is to be noticed that the condition P{Y — 0} * 0 implies that 
0(y) is continuous at the point y = 0. Let 

fit) - e Ua dF(x) 

(4.1) g + (t) - j* e*» dO(y) 

g~it) = f° e Uv dG(y ). 

•*-00 

Clearly 

(4.2) H(z) = P{X -«F|0;F>0) + P{X — zF ^ 0; F < 0}. 

We introduce the functions 


(4.3) 


Ti(«) = P{X -zY£u;Y>0} = [1 - G(0)]-P{X - zF g u \ Y > 0},' 
7i(0 = £e<»dUu), 

r s (w) = P{zY -I|«;F<0|= G(0)-P{zY — X £ u \ Y < 0}, 

7 t(t) = jf e Uu dr 2 (u), 


T(u) m i\(«) + r s («) 



dr(u) = 7i(0 + 7 s(<). 


By (4.2) and (4.3), 


(4.4) tf(z) = r(0). 

We shall now evaluate Pi(u) and r 2 (u) in terms of F(x) and G(y), and also 
7 i(<) and y 2 (i) in terms of f(t), g + (t), and g~(t). 

Let us assume for a moment that P\Y > 0} ^ 0; that is, that G( 0) < 1. 
The conditional distribution of Y relative to the hypothesis that Y > 0 then 
has the d.f. 


(4.5) 


Gy(y) 


G(y) - G(0) 

1 - G( 0) ' 

. 0 , 


y £ o, 

y < 0. 


The d.f. of —zY relative to this hypothesis is Gi(—y/z) if z < 0, and 
1 - GtK-y/z) - 0] if * > 0. 


• By P{A 
po the si a 6. 


1 b) is meant the conditional probability of the event A relative to the hy- 
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It is well known that the corresponding d.f. of the sum X + (—zF) is given 
by a convolution of the d.f.’s of X and (—zF). 7 In the present case, tins result 
takes the form 


(4.6) P{X-zF£w|F>0} = 


F(u - v) dGi 0, z < 0, 


z > 0. 


Referring to the definition of these Lebesgue-Stieltjes integrals [4, pp. 662-663], 
we see that the change of variables w = —v/z yields the equations 


(4.7) P{X — zF <; m | F > 0} 



F(u + zw) dGi(w), z < 0, 

F{u + zw) dGi(w — 0), z > 0. 


Now the definition of the variation of Gi(y ) [4, pp. 341-342] used in forming 
these Lebesgue-Stieltjes integrals makes no distinction between the variation of 
Gi(y) and that of Gi(y — 0) over any bounded set contained in an interval of 
integration a < y < «, provided that Gi(y) is continuous at a in the two-sided 
sense. Since Gi(y) is continuous at y — 0 in this sense, it is possible to replace 
Gi(w — 0) by Gi(w) in the second of the two integrals in (4.7). 

Equation (4.7) is clearly true for z = 0 as well as for all other values of z. 
Referring to (4.5) and (4.3), we see that 

Ti(m) = Jf F(u + zw) dG(w), all z. 


The c.f. of the convolution (4.6) is the product of the c.f.’s of X and of the 
conditional distribution of — zY [1, p. 36]. This product is/(/) • J e~'“ v dGi(y). 
Thus by (4.5), (4.3), and (4.1), 

(4.8) yi(0 - [1 - (7(0)] [m • Jf e~ u,v d(?i(y)] = f(t)g + (-tz). 

We have established (4.7) and (4.8) under the condition that P{Y >0} ?•* 0. 
However, it is obvious that they are trivially true if P\ Y > 0} = 0. 

We turn now to r*(u). Supposing that P{Y <0} ^ 0, the conditional 
distribution of F relative to the hypothesis that F < 0, has the d.f. 


Gt(y) 


G(0)’ 


V < 0, 


l, y £ 0. 


T See [1, pp. 36-36]; also [71. 
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The conditional distribution of »Y has the d.f. G»(y/z) for t > 0, and 
1 — Gt[{y/z) — 0 ] for z < 0 . The d.f. of —X is 1 — F(—x — 0). Thus 

P{zY - X £ u\Y < 0} - 

[ jf:{ 1 ^_( M _,_°] M [l. ( 7 s (,_ 0 )], z< 0 , 

£*" {1 - F[-(u -v)- 0]} defy, z > 0, 

= 1 — £ F(zw — u — 0) dG»(w). 


Evidently the first and last members of this equation are equal for z = 0 as well 
as for all other values of z. From (4.3) we obtain 

r 2 (u) = G(0i) — f F(zw — u — 0) dG(w), all z. 

J— oo 

Also, as before, 

72(0 = 

Obviously, the last two equations are still true if P{Y <0} =0. 

To summarize, we have shown that 

(4.9) r(tt) = G(0 ) + J F(u + zw) dG(w) — £ F(zw — u — 0) dG(w), allz; 

(4.10) 7(0 = mg + (-zt) + f(—t)g~(zt). 

Referring now to (4.4) and letting u = 0 in (4.9), we are able to state the 
following theorem: 

Theorem 4.1: If X and Y are independent chance variables with respective 
d.f.’8 F{x) and G(y) f the d.f. of the quotient X/Y is given by th# formula 

(4.11) H(z) - G(0) + [ F(zw)dG(w) - [° F(zw - 0) dG(w) 

JO J- oo 


for all values of z. 

We shall not attempt to make a careful study of the above formula, such as 
the studies which certain writers have made of convolutions. However, it does 
seem desirable to place on record here certain remarks concerning it of a more 
or less superficial character. For convenience in later reference, we state these 
remarks in the form of four lemmas. 


Lemma 4.1: Let M\ be the set of aU values of z such that if z c Mi , the set of 
discontinuity points of F(zw) on the w-axis has a point in common with the 

point spectrum of G(w). Then if z e C(ilfi ), 8 the integrals j F(zw ± 0 ) dG(w), 


8 By C(Mi) we mean the complement of Mi with respect to the 2 -axis. 



DISTRIBUTION OF A. QUOTIENT 


415 



F{zw ± 0) dG(v>), 


are Riemann-Stieltjes integrals and consequently the inte¬ 


grands can be replaced by F{zvo) without altering the values of the integrals. 

The lemma follows immediately from the definitions of Riemann-Stieltjes and 
Lebesgue-Stieltjes integrals. 

Lemma 4.2: The set M\ is denumerable. 

The proof can easily be supplied by the reader. 

Lemma 4.3: Let Mtbethe set of all values of z such that if z t Mt, T(u) is discon¬ 
tinuous at u = 0. Then Mt C Mi. 

To prove this statement, we first observe that T(u) is a genuine d.f. [1, p. 11]. 
For obviously F(— «) = o, r(+°o) = 1, and since Ti(u) and r a (u) are both 
products of d.f.’s into constants, these two functions, and therefore T(m), must 
be continuous from the right. It is this last property of r(w) which is needed 
for our present purposes; in particular, we have the relation lim u _ + or(u) = 
r(0) = H(z). On the other hand, by the general convergence theorem for 
Lebesgue-Stieltjes integrals [4, pp. 663-664], we have 


lim u __o r(w) = G(0) + f F(zw — 0) dG(w) — f F(zw)dG(w). 

Jo J-co 


If z be chosen so that this integral and the ones in (4.11) are all Riemann- 
Stieltjes integrals, the expression ( zw — 0), wherever it appears, may be replaced 
by zw without changing the values of the integrals. Thus for such a value of z, 
r(+0) = r(—0). According to Lemma 4.1, we can be sure that at least if 
z c C(Afi), the integrals here will be Riemann-Stieltjes integrals, so our proposi¬ 
tion is proved. 

Since H{z\ + 0) is equal to r(+0) with z = z x , and H(zi — 0) is equal to 
r(—0) with z = Z\ , we have the following result: 

Lemma 4.4: The set ilf 2 is the set of discontinuity points of H(z). 

By using the alternate form of the convolutions used to derive (4.9), we obtain 
a representation of T(w) somewhat more complicated than that appearing in 
(4.9). The corresponding formula for H(z) is as follows: 

G(0)[1 - F(-0)] - G(0)F(0) + j[° G^)dF(c) 

~ I G (^~°) dF ( v -°). *<°; 

(4.12) H(z) = F(0)[1 - G(0)] + G(0)[1 - F(-0)], s = 0; 

1 + G(0)[1 - F(—0)] - G(0)F(0) + f G(^jdF(v - 0) 

- jf G (?-(!) dF(v), z> 0. 
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5. Representation of H(z) by characteristic functions.. A simple algebraic 
formula connecting the c.f. of Z with those of X and Y is not available. How¬ 
ever, there exists an interesting representation of H(z) in terms of the functions 
fit), g+iOf and g~(t). The result may be stated as follows: 

Theorem 5.1: 9 Let the distributions of the independent variables X and Y have 
finite first absolute moments, and let the integral 

(5.i) £ 1 + j*') iMgl ~ ± Ik ~ M- MJ dt 


be finite for each value of z. 
and let (^ + jQ|'-@ 


Let A (u) be any d.f. with a finite first absolute moment, 
dt be finite, where 5(t) is the c.f . of A (u). Then 


(5.2) H(z) = A(0) 


- f 

2 t ri jLo, 


‘f(t)g + {-et) +f(-t)g (zt) - 6(1) 


dt. 


If the integral obtained by formal differentiation under the integral sign with 
respect to z in (5.2) is uniformly convergent in a certain interval I, then the 
frequency function h(z) of the distribution of z exists in that interval and is given 
by the formula 

h(z) = ^ £" (f(t)g + '(-zt) - f(—t)g~'(zt)}dt, zel. 


We remark that the condition (5.1) will be satisfied for all values of z if fit) 
alone satisfies a similar condition, inasmuch as | g+it) | 2s 1, | g~(t) | ^ 1. 
Important special cases of the theorem arise when A (u) is replaced by F(u ) or 
G(u), and when A(w) is so chosen that A(0) = 0. 

Our proof of the theorem will depend on a rather general result due to Cramer 
[1, Theorem 12], which we shall restate here in the special form applicable to the 
problem at hand. 

Lemma 5.1: Let R(u) be a function of bounded variation over the infinite 
interval — <» < u < <x>, let lim R(u) = lim R{u) = 0, and let r(t) = 

u -»—00 U -^+00 

£" e itu dR(u). If (a) £~ I u | dR(u) and (b) (£♦01 r ~ dt, both are 
finite, then for every value of u, 


R{u) = 


r 

2 Til, 




r(t) 


,-itv 


dt. 


To prove Theorem 5.1, we observe that since T(w) is a d.f. (see proof of Lemma 
4.3), the difference T(u) — A (u) is a function similar to the function R(u) of the 
lemma. If we do let R{u) = r(u) — A (u), it follows at once that r(t) = y(t) — 
HO =* f(0'9+(~~ z 0 + /(“ 09~{ z 0 — HO- H we can verify that this JJ(w) 


9 The theorem is due to Cram6r in the case in which 0(0) — 0, and A(u) m G(u ). See 
[1, Theorem 16]. 
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satisfies conditions (a) and (b) of the lemma, then we shall have established the 
relation, 

re.) - 4(u) - >. 

ATI «L«o t 

for all values of u, and letting u — 0 in this equation, we shall obtain (5.2). 

Condition (b) in the lemma is taken care of by (5.1) and the condition on 4(0 
in Theorem 5.1. Clearly condition (a) will be satisfied if it turns out that r(w) 
has a finite first absolute moment. Now the existence of finite first absolute 
moments of X and Y will insure the existence of finite first absolute moments 
for the conditional distributions involved in the definitions of Ti(u) and Tt(u), 
because E\X — zY \ £ E \ X \ + \z\E\Y \. It follows quite readily from 
this that the first absolute moment of r(«) is finite. The proof of the theorem 
is complete. 

6. Distributions of variable form. We consider now the case in which the 
distributions of the numerator and denominator approach limiting forms. 

Theorem 6.1: Let the independent variables X a and Yp have respective df.’s 
F a (x) and Gp{y) which depend upon the two parameters a and 0. Let H a ,p(z) be 
the d.f. of the quotient Z a ,p = XJYp . If there exist two chance variables X and Y 
with respective distribution functions F(x) and G(y) such that lim F a (x) — F(x) 

a-* ao 

at all points of continuity of F(x), and lim Gp(y) = G(y), at all points of con- 

0-*co 

tinuity of G{y), then 

(6.1) lim H a ,p(z) = lim lim H a ,p(z) = lim lim H a ,p{z) — H(z) 

a—*oo a-*ao /9-*ao fi-oao a-*«o 

&-**> 

at all points of continuity of H(z), where H(z) is the d.f. of the variable X/Y. The 
double limit in (6.1) is uniform in any finite or infinite interval of continuity 
of H(z). 

In the interpretation of the limits involved in this theorem, it is to be under¬ 
stood that in the hypotheses, a may tend to infinity over any unbounded set 
T„ of the a-axis, and /3 may tend to infinity over any unbounded set Tp of the 
/8-axis, provided that in (6.1), a and 0 are restricted so that a c T„ and fi t Tp. 

To prove the theorem, we introduce functions /„(<), Op(t), gj{t), r«.p(u), 
7«,u(0> which are defined by equations (4.1) and (4.3) with F, G, X, Y replaced 
respectively by F a , Gp , X„ , Yp . On the other hand, with reference to the 
distributions of X and Y, we employ the notation of section 4 without modifica¬ 
tion. According to the work in that section, r(u) is given by (4.9) and its c.f. 
y(t) is given by (4.10). Also, 

7«.(j(<) -=/.(fWX-ri) + fa(-t)gp(zt). 

But it is an immediate consequence of our hypotheses that lim f a (t) *= f(t), 
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lira Of (0 = Q + ( 0> and lim g B ( t ) = gf“(<), all of the limits being uniform in any 

0-*4O /J-+0O 

finite interval of values of t . 10 Thus 

(6.2) lim y a ,a(l) = lim lim y a ,f(t) = lim lim Y a ,/j(t) = >(()» 

a-♦oe o-*oo 0-*co /3-*oo a-»oo 

uniformly in any finite interval on the £-axis. 

Consider the extreme members of (6.2). It follows immediately from a well- 
known general theorem 11 that lim T a A u ) = r(u) at all continuity points of 

«-*«o,0-* ao 

T(u). Then since H a , B (z) = r a ,s(0) and H{z) = r(0), we find that 

lim H aiB (z) = H{z), z c C(Mi), 

where Mi is the set defined in Lemma 4.3. By Lemma 4.4, the set Mi is the 
set of discontinuity points of H(z), so the equality of the first and last members 
of (6.1) is established at all continuity points of H(z). The uniformity of the 
limit is due to a general property of convergent sequences of d.f.’s; see [1, p. 31]. 

The existence and equivalence to H(z) of each of the iterated limits in (6.1) 
may be established by two consecutive applications of the foregoing argument, 
and by the use of (6.2). We leave the details to the reader. 

It is to be remarked that both H a ,p( z ) and H(z ) can be represented by (4.11), 
provided, of course, that F and G in (4.11) are replaced by F a and Gp in the 
case of H a , 0 ; thus our theorem essentially states that the order of the double 
limit and the integration is immaterial in this formula. A similar remark 
applies to formula (5.2). 

The reader is reminded that we have tacitly been assuming that the d.f. of 
any variable appearing in a denominator is continuous at the origin. In case 
Gfi(y) does not satisfy this condition, but G(y) does satisfy it, and if, as suggested 
in section 2, we consider H a ,p(y) to be the d.f. of the conditional distribution of 
Z a ,$ relative to the hypothesis that Y 0 ^ 0, then it can be shown rather easily 
that Theorem 6.1 remains true with this modified interpretation. But if G(y) 
is discontinuous at the origin, and if H{z) is interpreted as the d.f. of the condi¬ 
tional distribution, then (6.1) may be no longer true, as can be shown by trivial 
examples. 

Perhaps the most important cases of variable distributions arise in the con¬ 
sideration of sums of independent chance variables. We accordingly present the 
following synthesis of Theorem 6.1 and a simple case of the Central Limit 
Theorem. 

Theorem 6.2: Let Ui , t / 2 , • • • , be a sequence of identically distributed chance 
variables, each with mean zero and ( finite ) standard deviation <t v , and let V \, 

See [1, p. 30]. 

11 See [1, Theorem 11]. The result needed here is a trivial extension of the theorem 
cited. 
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V*, ■ • • , be a sequence of identically distributed chance variables, each with mean 
zero and ( finite) standard deviation <r y . Furthermore, let the variables lIt and Vj 
be aU independent, i = 1, 2, • • • , j = 1, 2, • • ■ . If m and n tend to infinity in 
such a way that 



then the d.f. of the conditional distribution of the variable 
w _ Ui + Ut+ + 

Vi + Vi+ ... + TV 


relative to the hypothesis that the denominator is different from zero, tends uniformly 
to the function 


(6.4) 


jM-n 

J — 00 


k<Tv <Tv 


2 ,22 . “l 

<Ty K U * 4 “ 0"U 


For if we let 

Ui + U* + • • • + Um 

<jvy/m 

~ m,n ~ K+ Z* + -J: + ^ n ’ 

< 7vy/u 

then W m , n = \Zm/n((Tv/(Tv)Z m , n . The Central Limit Theorem [1, Theorem 20] 
states that the d.f.’s of the numerator and denominator of Z m , n each tend to the 

function f (\/y/2v )<T** /2 dt , which is the d.f. of a normal distribution with 

J-oo 

mean zero and variance one. By (3.3), the quotient of two variables, each of 

which has this d.f., has the continuous d.f. H(z ) = / (1/w) [1/(1 + z 2 )] dx. 

J— 00 

If we let ) denote the d.f. of the conditional distribution of Z m , n , relative 
to the hypothesis that the denominator of Z m ,„ is different from zero, then by 
Theorem 6.1, lim H m , n {z) = H(z) uniformly in z. Now the d.f. of the 

m-*oo,n - 

conditional distribution of W m , n is n/v/vi-)w], and because of (6.3) 

and the uniformity of the limit of H„, n (z), this approaches H[k{a v /o v )w]. 
Differentiating the last expression with respect to w, we find that the resulting 
frequency function is equal to J'{w)‘, and this concludes the proof. 

As an application of the theorem, let us consider the following problem. 
From an um containing white and black balls in the proportion of p to 1 — p, 
we shall make 100 random drawings of a single ball with replacement after each 
drawing. Let TFso.w be the ratio of the deviation of the number of white balls 
in the first 50 drawings from the expected number, to the deviation of the number 
of white balls in the second 50 drawings from the expected number. What is 



420 


j. h. ctmnss 


the approximate value of w for which P{Wkm ^ w | 6} = .05, where the 
hypothesis b is that the denominator of Wk ,so shall be different from zero? u 

To answer this question, we observe that the numerator and denominator of 
Wk,w can each be expressed as the sum of 50 independent identical chance 
variables, each with mean zero and with variance p(l — p). Thus according 
to Theorem 6.2, the approximate d.f. of W M ,eo is 

Jiw) = f - . - y ' du = ~ + - arctan w, 

J-* t 1 ■+• u 2 2 ir 

and the required value of w satisfies the equation J(°o) — J(w) = .05. The 
solution of this equation (correct to one decimal place) is w = 6.3. 

It is perhaps needless to remark that a study of the error involved in sup¬ 
posing J ( w ) to be the d.f. of W m , n in Theorem 6.2, must necessarily precede the 
unreserved acceptance of numerical results obtained by means of that theorem. 


7. Products of chance variables. We conclude this paper with a rather brief 
treatment of the distribution of the product of two chance variables. To pre¬ 
serve a notation uniform with that of the preceding sections, we shall w r rite the 
product as X — YZ, where the d.f.’s of X, Y, and Z are to be denoted, as before, 
by F(x), G(y), and H(z), respectively. The existence of F(x) is readily proved 
by the methods of section 2. The assumption that P { Y = 0) = 0 is of course 
unnecessary here, and will be dropped in this section. 

In the continuous case, an argument similar to the one employed in section 3 
will establish the following result: 

Theorem 7.1: If the joint variable (Y, Z) has the frequency function z ), 

then 


F(x) = 




and consequently F{x) is an absolutely continuous function of x. The frequency 
function of the distribution of X exists almost everywhere, and is given by the formula 


(7.1) 


m -rw-£\i I *(?,■>)*-£ 



In the discontinuous case, with Y and Z independent, we can write X = 
ZY = Z/(l/F) and use Theorem 4.1 to derive a formula for F(x). We have: 

F(x) = P{X P{Y * 0}P{X £ * | Y * 0} + P{X £ x;Y <= 0 ). 


m This hypothesis would always be fulfilled in ease BOp is not an integer. 
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Excluding for a moment the trivial case in which P{F 5 * 0} = 0, let Oi(y) be 
the d.f. of the conditional distribution of (1/F) relative to the hypothesis that 
Y* 0. Then 


G(- 0 ) + 1-0 


a-> 


v>o , 


P|F»«0}ft(y) G(-0), 

o(-o)-o(i-o), 


y - 0 , 

y < 0. 


It is to be observed that Oi(y) is continuous at y = 0. Using Theorem 4.1, we 
find that 


P{X <>x\Y*0) = GjCO) + jf H(*w) dGi(w) - £° P(xw - 0) dGi(w). 
So 


p|r^o|P(igi| r^o} 


- °<-°>+ ii aMi [-°{i -»)] - r«- - - °)] 

- 1 «-°> ■+ i> (;)"<•> " jC ■ H (; - 1 °) »• 

This equation is trivially true if P{F 0| = 0. Also, 


P{Xga:; F = 0} 


°, 

cm - o(- o), 


a: < 0, 


Thus we obtain the following theorem: 

Theorem 7.2: // F and Z ore independent chance variables with respective d.f.’s 
G(y) and H(z), then the d.f. of their product is given by the formula 


(7.2) 


F(x) = f H 
*0+0 


dG(v) - - o)dG(v) 


0(-0), 

0 ( 0 ), 


x < 0, 
X £ 0, 


for all values of x. 
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SOME GENERALIZATIONS OF THE LOGARITHMIC MEAN AND OF 
SIMILAR MEANS OF TWO VARIATES WHICH BECOME 
INDETERMINATE WHEN THE TWO VARIATES ARE EQUAL 

By Edwaed L. Dodd 
University of Texas 

1. Introduction. The logarithmic mean m of positive numbers, x and y, as 
given by 

m = V ~ x _ y - x 

log, y - log, x log, (y/x) 

is of considerable importance in problems 1 relating to the flow of heat. 

The logarithmic mean arises, moreover, in less technical problems such as the 
following: Given that incomes l in the interval, x £ t £ y, are distributed with 
frequency inversely proportional to t. That is, with k = a positive constant, 

(2) 4 >(t) dt = (/c/0 dt 

is the number of individuals with incomes lying between t and t + dt. Then, 
with x > 0, the total number / of individual incomes is 

(3) / = [ 4>(t ) dt = fc(log y - log x). 

Jz 

The combined income g of the group is 

(4) g = [ t</>(t) dt = k(y - x). 

And thus the logarithmic mean g/f of the two numbers x and y in (1) is the 
arithmetic mean of all the incomes; that is, the average income —at least to a 
close approximation if the group is large enough that integration may replace 
summation. 

Now m in (1) becomes indeterminate , if x = y. Nevertheless, if c > 0, and 
x —> c and y —> c, then m —> c. Thus, we may properly speak of m as a mean of 
these two variates, x and y. 

This logarithmic mean is one of a set of means studied by Renzo Cisbani 2 , the 
general form being 

1 See Walker, Lewis, and McAdams, Principles of Chemical Engineering , McGraw Hill & 
Co., Part IV, Logarithmic mean temperature difference. 

* R. Cisbani, “Contributi alia teoria delle medie.” Metron f Vol. 13(1938), pp. 23-34. 
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(5) 


■ j*+y _ a *+> ni/» 

JWi + i W - oO. 


and the logarithmic mean appearing when x = 1, j —* 0. 

In a chart between pages 28 and 29 Cisbani exhibits thirty varieties of these 
means (5). It will be noticed that z is indeterminate if a = b. 

Some methods for dealing with means which may become indeterminate 
forms I have indicated in a recent paper. 8 

Now a generalization from a mean of two variates to a mean of three or more 
variates may sometimes seem to be immediate . However, for the arithmetic 
mean (x + y)/2 of two variates x and y } the function [min. ( x , y, z) + max. 
(x, y , z)]/2 is as much a generalization as is the arithmetic mean (x + y + z)/ 3. 
Actually , the direction in which generalization is to take place is arbitrary . 
However, it is natural to expect the generalization to arise from a problem 
somewhat similar to one that may give rise to the original mean. And it is 
desirable that to the generalization should be carried over as many properties 
or characteristics of the original as is possible. 

In the foregoing illustration, we considered a single interval x g t g y in 
which incomes are distributed in accordance with a relative frequency propor¬ 
tional to 0(/). And the arithmetic mean of all these incomes was obtained as a 
logarithmic mean of the two range limits x and y, at least approximately, allow T - 
ing integration to take the place of summation. If <t>(t) had been kt~ m y instead 
of kC l } then the average of all the incomes w f ould have been the geometric mean 
of the two range limits x and y. 

To effect a first generalization, we shall now' suppose an original interval x 0 to 
x n , to be divided into n subintervals by points x r such that 


( 6 ) 


Xq ^ X\ ^ 3/2 ^ ^ Xfi —i ^ Xn . 


For each subinterval x r -\ to x r the same function <t>(t) wall be used to describe 
the relative frequency; but the total population for this subinterval will be con¬ 
trolled by a positive constant k r , in general different for the different subintervals. 
This may be described as stratification . To make this more concrete, let us 
suppose, as before, that 0(f) = k/t. Then, with x 0 > 0, the mean M, which 
will be described more in detail in the next section, will take the form 


/m\ if jdLsl k r (x r Xr~ l) 

El" fcr lOg (Xr/Xr-l) ' 

Applied to incomes, M would, like m in (1), give average income. To get 
some idea of the significance of k r , let us imagine that in some community there 
are f r individuals in the income bracket x r ~i to x r , say from $1001 to $2000. 
Let us suppose now that f r other individuals with incomes between $1001 and 
$2000 distributed in exactly the same manner move into this same community. 


* “The substitutive mean and certain subclasses of this general mean.” Annals of Math . 
Stat ., Vol. 11(1940), pp. 163-176. See p. 171, 
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Then k, would be changed to k', = 2 k r . But, of course, among the entire 2f r 
individuals the relative distribution of incomes is exactly the same as among the 
original /, individuals. 

In this interpretation k r is a weight for a bracket of items. But, taking M in 

(7) just as it stands, k, is the weight for the consecutive pair of numbers av-i 
and x r . 

2. The first generalization. When t is in some interval, I — (a, a'), finite or 
infinite, let 4 >(t) be a non-negative, integrable function of t. 

And in I let the points at which <f>(t ) = 0, if any, form a null-set. Then, with 
t in J, write 

(8) *(«) = f 4 >(t) dt. 

"a 

And, supposing that in (6), a < x 0 , a H < a', set 

(9) f r - f ' <f>(t) dt = *(av) - 4>(x,_i); r « 1, 2, • • •, n. 

Jz r -i 

Then f r > 0; since <f>(t) > 0 and is continuous almost everywhere in (x r _i, 
x r ). Since in any finite subinterval of I, t4>(t) is integrable, we may set 

(10) *(l) = [ <p(t) dt — l t<f>{t) dt. 

J a J a 

(11) ffr = f i{t)dt = 'l'(Xr) - ^(Xr-l). 

Now, by a mean value theorem, there exists a number t', such that 

(12) fifr//r = t' T , X r _i < t' r < X r . 

Taking positive numbers k r , the weighted arithmetic mean of g r /f r , with 
weights Kfr is then 

(n) M . = Z"fcr[*QCr) - ^(x,-Q] 

22" Kfr W*(*r) - 4>(x r _i)] 

If <t>(t) = k/t, this becomes the mean (7) associated with the logarithmic 
mean. Now, since for (13) the weights k,f T are positive, it follows from (12) 
that 

(14) Xo < t[ g M £ & < x n . 

Suppose, now, that b lies in I, and that subject to (6) each x, —► b. Then, 
by (14), M —*b. And thus M is an internal mean of xo, Xi , • • • , x„, although 
with the x’s all equal, M assumes an indeterminate form. 

In (13) the weights k, are applied to pairs of numbers, either to ^(x,) — ^(xr-i) 
or to 4>(x,) — 4>(x r _i), whereas in most weighted means, the weights are applied 
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to individual numbers. We consider now a form equivalent to (13), but in 
which the weights <v are attached to the individual numbers. It seemed possible 
to get a more general mean than (13) by abandoning certain conditions upon 
the weights c r which first arose. But such relaxing of restrictions leads to diffi¬ 
culties, as will be shown. By setting 


(15) Co — ki , c„ — k n ; 


Cr “ k r ~ k r +1 , 


we may write M in the form; 
(16) M 


S? Cr^jXr) 

So ^(®r) 


r = 1, 2, • • • , n - 1, 


On the other hand, if we choose c’s subject to 


(17) Co < 0, c r < — (co + ci + ■ • • + <V-i) for 0 < r < n, 

(18) C n — So Cr J 


then positive k ’s can be found to pass from (16) back to (13). 

The question arises whether if the conditions (17) are abandoned, and with 
the c T not all zero, (18) is retained as 

(19) So c r = 0; Some c T ^ 0, 


M in (16) will continue to be a mean of z 0 , xi, • ■ ■ , x„ , possibly, an external 
mean. 

It may be noted that the condition S c r = 0 arises from the fact that when 
parentheses are removed from (13), each k r is matched by —k,. 

By an example, it will be shown that under (19) alone, M in (16) may fail 
to be a mean. In (8) and (10) take a — 0. Then with n = 2, <f>(t) = t, take 
Co = 1, Ci = —2, Ct = 1 in (16). Then 


( 20 ) 


Xp — 2 x\ 4- 
2(xo - 2xi + Xi)' 


If b > 0, e — x 0 — b, n = xj — b, and $ = x* — b, then 


( 21 ) 



- 2r? -I- e 

t-2i) + t 


If now rj = 2«, and £ = 3e + e 2 , then 

(22) M — b (2 6* + < 2 )/2 —* 6-1-1, as t —► 0. 


Since M does not approach 6 here, when x 0 , xi, and X* —* 6, in the manner 
specified, M in (20) is not a mean of Xo, Xi, and xj. 

We may enquire, further, whether the function M in (16) could be a mean if, 
discarding (13), (17) and (18), we put upon c r the single restriction c, > 0. In 
that case, if x 0 < t < x „, then, since *(<) and ^(f) are continuous functions of 
t —see (8), (10)—it would follow that if each x, —> t, then M —► 1 i'(<)/♦(<)• But 
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if M is to be a mean of Xo , x x , • • • , x n , then M —► t when each x r —► t. Thus 
we are led to ¥(0 = t<P(t). Except possibly for points of a null set, $(<) and ¥(0 
have derivatives <t>(t) and ^(<); and thus 

(23) m = *'(<) = mt) + *(o = w) + *(t). 


But then, since $(1) = Uj>(t )—see (10)—it would follow that <t>(<) = 0 almost 
everywhere in /; but 4>(<) > 0, if t > a. Hence the assumption c r > 0 is not 
sufficient to make the function in (16) a mean of xo, Xi , • • • , x„ . 

In’ the simple case of n — 1, M becomes 


(24) 


M - ~ • 


and this is a symmetrical function of x a and Xi. 

The question arises whether if n > 1, M in (13) or (16) can be a symmetrical 
function of x 0 , x,, ■ • ■ , x„ . Assume, if possible, that with x < y < z, 


(25) 


v = C0*(x ) + Ci1r(y) + c 2 ¥( z) 
’ ’ co4>(a:) + c x ^{y) + <h$>(z) 


is a symmetrical function of x, y and z. Now if a/b — c/d, and b — d 0, it 
is well known that a/b = (a — c)/(b — d). 

Hence, if H(x, y, z) = H{z, y, x), and c 0 ^ Cs, then 


(26) 


H(x, y, z) 


(c o - c 2 ) Mx) - *(z)l 
(co — c 2 ) [<l>(x) — 4>(z)] ’ 


which is not symmetrical in the three variables. Then H is not symmetrical 
in x, y and z, unless, possibly, when c„ = . 

Likewise from H(x, y, z) = H(x, z, y), we are led to the conclusion that H 
is not a symmetrical function of x, y, and z, unless possibly when Ci = c 2 . But 
Co = Ci = ci substituted into (15) makes k\ = kt = 0, which is contrary to 
hypothesis that lc r > 0. Then in (25) the constants co, c x and c 2 can not be 
chosen in conformity with (15) so as to make // (x, y, z) a symmetrical function 
of the three variables. 

Symmetry in two variables will appear, however, if the mean (13) reduces 
to a mean of just two variables as it does when each k r — k , constant, in which 
case, 


(27) 


M = *(*«) ~ *(so) 
4>(x fl ) - <*>(x 0 ) ‘ 


Although in the generalization (13) symmetry is thus lost, another property, 
homogeneity is retained in what seem to be the most important cases. 

Most means S2(x, y, • • • , w) in common use are homogeneous functions of their 
arguments. That is, if c is a constant, and G(x, y, • - • ,w) and G(cx, cy, • • • , cw) 
are both defined when x, y, • • • , w lie in some interval J, then 
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(28) Q(cx, cy, • • • , cw) « <£l(x, y, • • • , w). 

This homogeneity is associated geometrically with ruled surfaces, in particular 
with cones. 

With reference to (8) and (10), let us write 


(29) 


F(x, y ) 


V(y) - *(x) 


Hv) ~ $(*)’ 

And now, let us consider a special variety of means obtained by taking in (8) 

(30) *(0 - f. 


where q is any real number. Then F(x, y) is a homogeneous mean; that is, 


(31) F(cx, cy) = cF(x, y). 

This is valid, indeed, even in the special cases, q = 0, — 1, and —2, which lead, 
respectively to the arithmetic mean, the logarithmic mean (1) and to a second 
variety of logarithmic mean 

(32) „ _ gJsts/a 

y — x 

exhibited by Cisbani. It may be noted that q = — 3/2 leads to the geometric 
mean, and q = —3 to the harmonic mean of x and y. 

It is conceivable that for <f>(t) other functions than t 9 —functions not equivalent 
to t 9 in integration—might be used to lead to a homogeneous F(x , y) in (29). 
But such functions, if any, would hardly seem to be in common use. 

The M in (13) retains the property of homogeneity, at least for <t>(t) = t q ; 
and so will also the more general means exhibited in the next section. 


3. Further generalization. The means of Cisbani (5) suggest the following 
generalization. Let p be an integer or the reciprocal of an odd integer. With 
the notation of ( 13 ), take k r > 0, and 

(33) F, = Zr krff, Gp = £r k T g v r> 

( 34 ) Mp = [Gp/Fpf-. 

Indeed, if in (8) and (10), a 2* 0, then g r > 0; and we may take for p any real 
nOmber except zero. Now, M p p may be described as the weighted arithmetic 
mean of (g r /fr) p with positive weights h r f? . And hence M p is an internal mean 
of Zo, X \, • • * , Xn | that is 

(35) Xo Mp ^ X n • 

Furthermore, if in (8), — t 9 f where q is any real number, then M p is a 

homogeneous mean of Xo , x\ , • • • , x n . 
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Another generalization may be obtained by writing 

(36) fllr * Qr/fr , 

(37) - M' v = [Lk r m*/2k,] llp . 

And still another 

(38) M" - [m*‘ -mS* • • • rot"] 1 ' 1 *'. 

These means (37) and (38) are internal; and they are homogeneous, if F(x, y) 
in (29) is homogeneous. 

The foregoing means are not, for n > 1, symmetrical functions of 
Xi , x*, • • • , x„ . Now the mere abandonment of (6) may lead to functions like 
(20) which are not means at all. But symmetry may be introduced as follows. 
First, lay aside (6), but suppose that the x, are all different. Then let 

(39) f r ,. = f ' <t>(t) dt, g r ,. = f ‘ t<t>(t) dt ; 

* x r *'*r 

where r = 0, 1, • • • , (n — 1); r < s 5 n. Then, let 

(40) U = Xfl,. , V = 2(7*.. ; 

where U and V is each a sum of n(n — l)/2 terms: Let W be the double-valued 
mean 

(41) W = ±[V/U] m . 

Then IF is a symmetric function of x 0 , X\ , • • • , x„ . If, in (8), a' ^ 0, then 
in (12) each g r /f r < 0; and in (41) the negative value of W is an internal mean. 
But the positive radical is external. On the other hand, if a 2: 0; then g r /f r > 0; 
and the positive radical in (41) is internal. In this case, it may be well to use 
for IF only the positive value of W. 

In the more general case where a < 0 and a' > 0, the fractions g r /f r may have 
different signs. But, in all cases, at least one of the two radicals (41) is an 
internal mean of xo, x,, • ■ • , x„. Moreover, IF is homogeneous, if in (8), 
<Kt) - f. 

Finally, let 

(42) m r ,. = g r ,./fr,. , 

(43) Z - ±{[2mU/n(n - l)} m . 

Then Z is symmetric; and at least one value is internal. If a > 0, we would 
naturally take Z > 0; and this Z is then an internal mean. Moreover, Z is 
homogeneous if the m,., are homogeneous; that is, if F(x, y) in (29) is homogene¬ 
ous for every x and y in 7. 



A STUDY OF R. A. FISHER’S z DISTRIBUTION AND THE RELATED 

F DISTRIBUTION 1 


By Leo A. Aroian 
Hunter College 


1. Nature of the problem. Consider two samples of Ni and Nt drawings, 
each sample drawn from one of two populations consisting of variates normally 
distributed with equal population variances a*. We define the two sample 

means £i = , £3 = , x,-’ s and x/s independent variates. We calculate 

Al 1 N 3 

from the two samples 


22 ( x < - &)* 

81 = —- and si = — 


2 - *) 2 


ni 


n 2 


ni = Ni — 1, n* = i\T a — 1. 




The distribution of z = $ log is well known. 

82 


(1.1) 


P(z) = 


2ni"‘ «*"’ 




B 


(rue* 

\2 ’ 2 / 


+ «*) 


t(»l *♦'"»! ) 




We shall denote the ordinates by y(z ). The purpose of this study is to discuss 
the seminvariants of the z distribution and also to find useful approximations 
for them; to show that as ni and n a approach infinity in any manner whatever 
the distribution of z approaches normality; to find the upper bound of the ab¬ 
solute value of the difference between the distribution function of z and the 
function determined by the approximate seminvariants of the distribution of z 
for ni and n* large; to approximate the z distribution by the Type III distribu¬ 
tion, the Gram-Charlier Type A series, and the logarithmic frequency curve; 

and finally to investigate the same properties with respect to the F distribution, 
a 

where F = e u = . The non-existence of the moments of F for certain values 

si 

of ni and nt is noted and explained on the basis of the distribution of the quo¬ 
tient -. 
x 


1 Presented to the American Mathematical Society, September 10,1938, New York City 
in part; and to the Institute December 27, 1939 at Philadelphia. 

429 



430 


LEO A. AROIAN 


2. General features of the z distribution. The i distribution is always uni- 
modal, asymmetrical if n\ 9 * n*, and symmetrical if n\ = n*. We see that 
interchanging ni and nj is the same as replacing 2 by — 2 . Fisher [7] noted that 
the two parameter family of curves includes as special cases the normal curve, 
the x* distribution, and Student’s distribution. The mode is at 2 = 0, the 
maximum ordinate is 


3,(0) = ^ + „,)-*<»>+».> 

nj\ 

B V2’2/ 


or approximately 


(2.1) y( 0 ) = ^{1(1 + I)} * for m and n, large. 

The two points of inflection are 

(2.2) z = £ log / ni 712 n * 712 ^ + 2 n? n 2 + 2 n \n\ + 2 /iin 2 \ 

l Tilth J 

They are equidistant from the mode, a property also of the Pearson system of 

d n y(z) 

frequency curves [24]. Also lim z n — f- - = 0. 

*-.±oo dz n 

3. The moment generating function and seminvariants. The moment gen¬ 
erating function of the z distribution is 


(3.1) M, 


M - fe)‘ 


'ni — 0 ni + 6 
<~2~’ ~2~ 

nf ni n A 


w r (l) Kl) 


The seminvariants of Thiele are defined by the following identity in 6 : 

(3.2) log M x { 8 ) = \i 8 + Xj ~-j + X« ~ + A 4 + 

To find X r we take the logarithm of the moment generating function, expand it 

6 r 

in powers of 6 and choose the coefficient of -. A complete discussion of proper- 

r! 

ties of seminvariants may be found elsewhere [4]. 


4. The seminvariants of 2 . Now by the following formulas [11] p. 38: 


(4.1) log T(1 + x) 


— 8lX , 8jX* 8s X* , 64 X* 

1 2 3 + 4 


I x l < 1, 
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8* X S , 8»X* , 8«X* 


(4.2) logr(l - x) - *x + ^ ^ + ..., |x| < 1, 


where in both formulas 


811 " 1™ ( :1 + \ + \ + \ + “' + h ~ log n )» 


*•= + + + ^ + 


n ^ 2. 


log B(|[l + x], $) = log tc — <rix + k ~ ff * % + j — '' 
(4.3) 2 3 4 


where 


1*1 < 1 . 


_ 1 _ 1 , 1 1 , 

ffn 1" 2» + 3" 4" + 


n £ 1 


-(l-gk)*, n ^ 2 - 


Hence from (4.1) and (4.3) 


108r (’-T" 1 ) - il*»-*(•> + 3) +1 (« + j) 

Since <r n = ^1 — «*> , n Si 2, we may write (4.4) as 

,4.5, 'o* r (l+i) = »log , - + S) +£ (> -1) ,. 

From (3.1) 

log MM ■ log r (^j) + log r (!l+*) 

+1 dog”. - login) - log T (jj) - lo * r (^)- 

The results assume slightly different forms for (A) r»i and n* each even; (B) «i 
and nt each odd; (C) »i even, n* odd; (D) ni odd, n« even. The general formula 
for Xk, for all cases is 
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> _ v'/(-l) , (r-l)I , (r- 1)11 

r “ « I (n, + 2*)' + (rt, + 2*)'J’ 


r £ 2. 


This result is not so useful from the point of view of numerical applications as 
the formulas which follow. 


5. Case A, n t and n* each even. From (4.6) 


(51} 108 r (V) - '<* ("^ s ) + >°* + • • • 

+ log(l -|) + , <« r ( 1 -0. 

Now log fl-—= — 2 r ( — —J) • There will be ^ — 1 series of 

\ n* — 2/ *-i k \»j — 2/ 2 

this sort, and only one series of the type log r ( 1 -|) = Sl‘(0* >,givenby 

(4.1). In the above expansion and those succeeding, terms not involving 0 are 
omitted, since such terms are not needed in finding the seminvariants of z. The 

series log T ^1 — 0 will always occur. Then 

loer (V # )- -S*[Grh) + (^) + - 

or 

^ r (v)-s?(0'-srg’(0‘ 

We remark that the double sum is zero if w* = 2. Similarly 

or 

«* ray 

By use of (5.3) and (5.5) we have for the seminvariants of z, when n> and ns 


ns — 2 — 

2 J 


in% — 4 — $' 

i 2 J 


are even 
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For Xi:. =» J we have by (4.6), (4.3), and (4.5) 

(5.7) X,,. - ‘[(log". - 'I' j) - (log". - *"g‘ 0]. 

6. Case B, n, and n, odd. We have 


433 


( 6 . 1 ) 


'<* r (V) - (*=^-0 + C 5 ^) 


+ 


+ l°g(^) + l°g r ( 1 - T - e ). 

Expanding log r ^ by (4.5) 

\ r (n t -e\_ ("v" e k e k 

B V 2 / Iti k(n, - 2)* + k{n, - 4)* 


( 6 . 2 ) 


+ ••• + 
00 a k 


a 


However s*^l = p + + + ■ ■■ , k > 1, which we shall denote 

hereafter by 4 - Hence (6.2) becomes 

( ,3) -ogr(V)-("+*)+Sr‘-StT(*ri)‘ 


Also 

(6.4) 

and 

(6.5) 


'<* r (”4- # ) - log (’^i_ 2 ) + log (*±f- 4 ) + 


+)og (4_») + logr (i+i), 


- r C4- # )=S t T[<^ 


+ 


e* 


+ ••• + 


a 


2)* («i — 4)* 


( 6 . 6 ) 


**r( ! r-0--*(- + 5) 


, f»(-l)V A (-l)*- 1 0* 

+ A; jfc tt (2T+1)*. 


Combining both these results (6.3) and (6.6) we have 
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Xr:. - (r- 1)1 




(2k + 


H) 


/ i V] 

+ S arn?)}- rsa - 

/i i \ /i i \ 

(6.8) Xi:j ■ i =« Qlognt — t ~) - (j 108 »■ - g sm)' 

7. Cases C, D, and values of s*, <7*, <* . The formulas for case C, ni even, 
nj odd are 

The results for case D, ni odd, th even are 

(78) X,, = (r-l)l{i(.,-‘;f , J f ) + (-l)'(<.-‘‘|‘’^)}. r £ 2. 


1 ~ J(wi—3) 1 1 1*2—1 1 

(7 .4,»— —W2—+ S aV.-iSi- 


We list the numerical values of 8jt and tk, k £ 10. The values of s* are from 
Stieltjes [20], 


(7.5) «i = 0.57721 56649 
8 2 = 1.64493 40668 
8 , = 1.20205 69032 
8 4 = 1.08232 32337 
8 t = 1.03692 77551 

8 . = 1.01734 30620 
87 - 1.00834 92774 
8 » - 1.00407 73562 
at = 1.00200 83928 
8 io = 1.00099 45751 


(7.6) <r, = log 2 = 0.69317 0206 
t» = 1.23370 00550 
la = 1.05179 97903 
U = 1.01467 80316 
< 5 = 1.00452 37628 

<« = 1.00144 70767 
< 7 = 1.00047 15487 
<„ = 1.00015 51790 
U = 1.0OOO5 13452 
<w = 1.00001 70413 


By means of the formula <* = s* (-i) , fc > 1 , tk was calculated from 8*. 

From the well known results for the Zeta function of Riemann f(s), [22], (p. 265, 
P. 267), 

(in t ..*.£1- .ai, *>i. 



(7.8) 
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^n*’ “ d - 

(7.9) «,-f(s)(l-l). 

8. The mean of the e distribution. From our previous formulas for 2 we 
prove that if n x = n*, 2 = 0 , and 2 < 0 for n» > «i, 2 > 0 for m > n». The 
maximum absolute value of Xi : , will occur when » 2 = 1 , n 2 = », or «i = »,«,=■ 1 , 

and from (7.4) or ( 6 . 8 ) we have max | X 1: , | = 5 4- i log 2 = ,6352. 

2 


9. Formulas for X*,, m., , X !:f , m., , \ t: , , and m ; , . We have four cases from 

(5.6), (6.7), (7.1), (7.3): 

A) 11 »<"»-*) i\ 

= . 82 2467 -l( g * + g 1), n^even. 

.2) X„.-2.467401 -,( t (TT¥! + g -.-odd. 

(9.3) Xs:» = 1.644934 - j ( g ¥ + g JJ^i) > n ' even , "2 odd. 

(9.4) X,,.= 1.644984 g - odd, - even. 

In all cases of course X 2; , > 0 and moreover X 2: , —»0 as raj and n 2 —> °o. We list 
1 /‘it 1 1 ‘it 1 1 \ 

(9.5) 4 (g *“ £ (,)’ -.-oven. 

1 /* ( tir ,) 1 »<»£-*> 1 \ 

(9.6) X,.- 4 ( g (TT¥ ,- g -.-odd. 

(9.7) Xa.* 1.803085 + |^ ^ 2 (F+lji), -oven,-odd. 

(9.8) X*.--1.803088+ ‘( g ^ - £ ±), -odd,- 

(9.9) Xd:. = .811742 - i + g n lt n 2 even. 

S (2*Ti)< + S 


even. 


raj, ns odd. 
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i »«lr‘ i\ 

(9.11) X*:, = 6.493939 - 6 ( £ . - .u + 72 u)> «i even, n* odd. 

\ Co (2fc + l) 4 t-i ft 4 / 

(9.12) X 4: , - 6.493939 - 6 ( £ £ /or. odd, n, even. 

\ *-i fc* *-o (tic + l;v 

We see X r: * > 0 whenever r is even. If r is odd X r .# < 0 if ih > rti , and X r; # > 0 
if u\ > rh . Also fir.g >0, ?h> n*, r odd, greater than one. Similarly /i r: « < 0, 
r odd > 1, n» > ni. 

10. Skewness, excess, and values of a n . We take for our measure of skew¬ 
ness a$ = ~i 72 = For n% > ni, a 3 < 0. Further the skewness increases 
M2 X* 

negatively if n x remains constant as ih —» <». Thus negative skewness will be a 
maximum for n 2 = oo ; m = 1, and positive skewness will be a maximum when 
m as 1, m ■* oo. The absolute value of maximum a* is 

|2fa 

(10.1) I a S I = 7372 = 1.5351. 

h 

M4 X 4 

As our measure of kurtosis wc use a 4 = “2 = 3 + 75 . As a measure of excess, 

M2 X 2 

U, we use E = a 4 — 3 = r *. The excess is always positive. 

X 2 


11. Approximations for X r: , by the Euler-Maclaurin sum formula. The exact 
results given previously for the seminvariants become unwieldy for n 1 and h* 
large. Hence we develop useful approximations for the seminvariants, and give 
the maximum error of the approximation. We find first our results for X r: * 
when ni and n* are even and r > 1. We begin with (5.6) 


Xr:* — 


(r-1) 




and rewrite this as 


( 11 . 1 ) 


Xr:. 


(r — 1)1 f f. 
2 r 1 




Now find the two sums of (11.1) by the Euler-Maclaurin sum formula [21] 
using the first three terms, and obtain 


Xr:# 


(r - 2) 1 |V n a + r - 1 ^ ^r Wi + r-1 

2 L\ n[ n\ 


+ 


r(r-l)/l ^(-1 Y\ 

3 \n5 +1 + nl +1 / 

r(r- l)(r + l)(r + 2)/ 1 
46 W* 


+ 



( 11 . 2 ) 
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We use the following theorem [10] (p. 539), to find the error: 

If f(x) is of constant sign for x > 0, and together with all of its derivatives, 
tends monotonely to zero as x —* », Euler’s summation formula may be stated 
in the simplified form 

- f fix) dx + H/n + /«) + fj (/, -/'»)+•■• 

Jo 4 1 


( — D* 1 j OBtIH-i /•/(«+1) _ /(2fc+X)\ 

+ ' (2*)! Un /o ^ + (2fc + 2)! /o ' 

where 0 < 6 < 1 and fi 2 = 1/6, B, = 1/30, £?« = 1/42, B t = 1/30, B 10 = 5/66, 
etc. If we qse 


( 11 . 8 ) 


Xr:« = 


(r — 2) l/rii + r — 1 
^ - 


+ (-iraiirJ), 

Til / 


then the error committed is of the same sign and less than 

H f_l_ , (-1)1 

31W 1 nf 1 


If we take 


X,« = 


(11.4) 


■Mfa + . r -.l-i- + r - a 

' L\ Hi Th / 

r(r — 1) / 1 

r~ w 


+ 


(- 1 ) 
n[ +1 


-■)]■ 


then our error is less than, and has the same sign as 

(r +2)1/1 (-l) r \ 

" 90 W n[ +8 /’ 

Finally if we use (11.2), our error has the same sign as, and is less than 

(r +4)1/1 ,(-1)1 

945 W* nr 6 /' 


12. Approximations for other values of ni and nj, r > 1. Now in case ni 
and nj are odd we have from (6.7) 

(12.1) Xr:» = (r — 1) ! j £ -_L_ +(-iy £ .-pi— r ). 

U-i(nj-l) (2 k + l) r (2« + l) r J 


Applying the Euler-Maclaurin sum formula to each of the sums in (12.1) we 
are led to exactly the same results given in paragraph (11). The other cases 
are obvious combinations of the sums in (11.1) and (12.1), and so for all values 
of ni and ns the approximate results for X r; ,, r > 1 are 
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( 12 . 2 ) 


2 l nt til ) 



Formulas (11.1) and (12.1) prove the result previously given for X r: , (4.7). 


13. The approximate values of Xu,. From (5.7) 

x... - \ [(><*». -5) - (**5)]. 

We use the Euler-Maclaurin sum formula on the sum 
l "!- 1 


l V^*l /‘y^Y 1 \_ 2 

h k = \h \k + 1/ th 


and the similar sum involved in Xi : *. Hence we have 

(13.1) Xl:» = \(~ ~ £) + £(”, - -.) " i (A - A) > 
2 \n 2 Wi/ 6\n5 n\J 15 \n 2 n*/ 


nt and n 2 even. 


n,, n s > 2. 


The errors committed by using one, two, or three terms of (13.1) are less than, 
and of the same sign respectively as 



_± (1-L) 

15 \ni n\)' 



For ni and 7h both odd we find the same result as (13.1). The restriction n x , 
n 2 > 2, may easily be replaced by n x , ^ 2 (for n x , even) and n x , w 2 ^ 1 

(for ni , n 2 both odd). When n x is odd, n 2 even, the formula is again the same 
as (13.1) if ni and n% are sufficiently large; but if n x and n 2 are small we find 
in this case 



Another method of finding (12.2) would have been to use the asymptotic ex¬ 
pression for log r(z). 


14. Approximate values of X r: « for values of r. We list the approximate 
values of X r: * to three terms. 
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(14.1) 


x..- jVauM + =4-‘) + i(i + i)-i ft + A) 

2 \ n% n\ / 3 \nl n\/ 15 \n| n!/ 

x,.. - 1 ('*±2 _ 5l+?) + (I _ * (I _ A") 

2\ <•! »! J \*i n!/ 3V„! „•/ 

* (t " 5 5 1 ) * * S - J) - “ (i - a) 


The approximate values given by Cornish and Fisher [8] (p. 319), are similar, 
but have fewer terms. Cornish and Fisher give no remainder term. From 

(14.1) and (12.2) we see the maximum absolute values of Xi r +i.-,, t 1, occur 
when n a = «, ni = 1, or n* = 1, »i = «. Similarly A**. ,r| 1, has its maxi¬ 
mum value for ni = n 2 = 1. The standard seminvariants of z are defined 

£r:« = -r , r ^ 2. We also note that for wj > ni, {s, +l: , < 0, r ^ 1 and hence 
XJ 

«>r+i < 0 also where a„ = • Moreover the maximum absolute values of 

M2 

fer:« and occur when ni = 1, nj = °o orn^ = 1, ni = oo; and also for a* r 
and a 2 r+ i. Approximately then 

(14.2) max( r!f = (-l) r ~ , r £ 2. 


Ci 

The .exact value for maximum a« ; , is 3 -|—r — 


7.07. 


15. Approach to normality of the z distribution. We prove the theorem: The 
distribution of z approaches normality as ni and n* —» « in any manner what¬ 
ever, with I = if-— —), <r\ — s f— + —). We also find an upper bound 
2\n* nj 2 \nj nj 

of the absolute value of the difference between the z distribution and the func¬ 
tion determined by the approximate seminvariants of z when »i and n% become 
large. To prove the theorem we start with the original distribution of z, and 
find when n t and w* are large, 


(15.1) 


P(z) 



i(ni+nt) 

e"“ dz. 


We change to standard units z *= Ur, + f, then 
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(15.2) P(t) = -±== ( %+* \ e n “' +B,, dt, 

V2r lm« + n t j 


» < < < ». 


We rewrite this as 


(15.3) Pit ) 


«i + wj 


|(ni+nj) 


v ' w V2^ lnxe 4B,(, ' + * >/<n ‘ + " ,) + n 2 e- ani< ‘' +,,/<Bl+n,) J 

Expand raie 2 »«<"+*>'<»«+».> and n^r 2 " 1(, ' +,,/( "‘ +B,) and add term by term. Divide 
this result by ni + n% from the numerator of P(i) to obtain 


(15.4) 


Hence 


(15.5) 


1 _4_ 2win 2 (<g + zf , n / 1 

(ni + W 2) 2 1 \(ni + n^)*/ 


Pit) = 


1 +- 


2ni n 2 (to + z)' 


-\2> -i(ni+n 2 ) 


(»i + ns) 2 


We evaluate (15.5) for n x and n 2 large by using logarithms. 
+ nt /, , 2ni n 2 (<<r + z) 2 


log < 1 + 


(«i + nj) 2 

ni + n 2 r/2nin 2 (<a-+ z) 2 \ 1 (2n 1 w J (/<7 + z) 2 


: {2U\U^ 

.1 (n 1 


2 \ (nj + ns) 2 


+ £<-riq 2 ^ 


1 nailer + z) 2 V _ 
(ni + ns) 2 / _ ' 


This gives 


a 2 /t i s 0 < 3 _i_ s 2\ _l n > n s n ± ± V / t y (2n 1 n 2 (<(7 + z) 2 ) r 

_ T 1 + 2to + + 5T+igi (to + •> + S <" 0 "2^+75)^-' 


We reduce this then to 


f -i 5 . 
---a 2< 


(za- 1 ) 2 if 2n 2 n 2 ) (<<r + 2) 4 
2 + 2 \(ni + n 2 ) 2 / nj + n 2 


+ terms involved in the above summation. Let U = c l S < c. Since 

zV 2 U' 1 

lim <7 = 0, lim U = 0. Similarly lim —— = lim — = 0. Con- 

»l,n 2 -*«o * ni,n 2 -*ao ^ 

n?n 2 , s , 4 <7~ 4 (i<7 + 2) 4 (<+t/) 4 „ (t+U)* 

(ni + Rs)* 4(ni + n 2 ) 4(ni + ns) ni.» J -»»4(ni + n 2 ) 

0. In like fashion 

f (-1 ) r / 2n 1 n a V jtc + if = y. (-l) r or- 2r (<or + i)’ r 
7A 2r \»i + nsj (ni + ns)' -1 £» 2r(«i + n 2 ) r_1 

Now clearly from our previous discussion for r = 2, we see 
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j. y' ( —l) y 9* (to -f i) % ' 
r—> 2r (Hi + Wt) r—1 



This completes the proof. 

We now consider the function, /(e), determined by the approximate semin- 
variants of e. We start with 


Xi;» 



and Xr:. = 


(r — 2) 1 f rtt + r — 1 
2 \ n» 


+ (-l) r 


wi + r — l \ 
n[ /’ 


v > 1, 


from (12.2) using only the first term. We may easily prove then that as tii 
and n* approach infinity in any manner whatever the function /(e) represents 
a normal frequency distribution with 


2 



and 



% + 1 

. nl 


+n- i v 

tti / 


This further shows the identity of f(z) and y(z) in the limit as n x and n* —► 
Since the moment generating function of f(z ) is 


we have 
(15.6) f(z) 


(i - ~Y ( * Yi + 

\ n 2 / \ nj 






i+*) 

n x ) 




de. 


00 . 


I have not been able to evaluate (15.6). We instead shall find an upper bound 
to the difference \f(z) — y(z) | as n x and rh become large. We form f(z) — y(z ). 
Then by use of Stirling's formula for n! with the remainder term and by the 
Fourier Integral Theorem, 

(15.7) | J(e) - y(z) | g (<J>' l * n ^ 1 *’" - 1 )y(e) v here 0 < ft < 1,0 < ft < 1, 


and 


(15.8) lim | /(e) — y(z) \ = 0, and for this case /(e) = y(e). 

ni,n2“*«o 

Of course (15.7) furnishes the upper bound of the absolute value between the 
frequency distribution of z and the function determined by the approximate 
seminvariants of e for any values of n\ and nt . 

Up to this point we have assumed that there exists a function determined by 
the seminvariants 


Xl:» 



and X r :« 


(r-2)l / n t + r + l + + r 

2 \ nj nj 



r > 1. 


This may readily be proved by using the following theorem [18] (p. 536): The 
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determined character of the moments problem for an infinite interval is insured if 
2) cFn /Sn diverges ^c„ = J x" dF(x)j . 


16. The Pearson types of approximating curve. In discussing the types of 
the Pearson system which may be expected to approximate the s distribution 
we shall use the results of H. C. Carver [1], and the further exposition of C. C. 


Craig [3]. To find the Pearson type we compute 5 = 


shall find it convenient to use the approximations aj = 


2a« — 3<*i — 6 
04 + 3 
\/2 (nj — n 2 ) 


We 


and 


<*4 


3 + 4 M j> to obtain 


(16.1) 


«!«*(«! + Hi) 


5 = 


(n i + n s ) 2 


3 n\ n-t + 3nin| + 2nf — 2ni n 2 + 2n» 


a> 


and consequently 0 < 5 ^ The only possibilities are Types IV, VII, VI, 
or V since the greatest value of a\ by (14.1) is 2.3565. Now if n 2 = n 2 , we have 
Type VII, since a 2 = 0, 5 > 0. In all other cases we shall have Types IV, V, 
or VI according as a? < 45(5 + 2), a* = 45(5 + 2), a\ > 45(5 4- 2). We 
neglect 5*. Hence a 2 < 85 implies 

ni(n i — 2) + n 2 (15n? + 6ni) + n\(l5nl — 8n?) 

(16.2) 

4* ?i*(tti 4* 6ni) — 2ni > 0. 

A simple investigation reveals then the following results: 

Type IV for , ni & 2, n\ n 2 . 

Type IV for ni = 1, 1 5 n* ^ 21; or w* = 1, 1 2 S n\ S 21. 

(16.3) Type VI for n\ — 1, n 2 > 22. 

for n 2 = 1, «i > 22. 

Type VII for n\ — nt. 

Clearly the z distribution has features comparable to Type IV since both have 
infinite range. However, Type IV is irksome to fit in practice. 


17. The Type HI approximating curve, the logarithmic curve, and the 
Gram-Charlier Type A. The criterion for Type III is 5 = 0, a* 9 * 0. We see 
that as ni and nt increase the value of 5 will decrease. Even for small values 
of nt and n* Type III will furnish a fair approximation to the z distribution. 
For example ?u = 10, = 5,5 = .094. The advantage of the Type III approxi- 
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mation rests on the fact that Salvosa’s tables may be used. From the chart in 
[16] once a! £ 2.3566, we are assured that the approximating Type III curve 
is bell shaped. For »i = 1, 2, n* = any value, this approximation is not all 
that could be desired, although even in such cases it does have value. We note 

that Type III has limited range at one extreme while the range of 

the z distribution is (— », =»). Salvosa’s tables extend as far as a* » 1.1, 
and since max a« = 1.5351, we see in some cases, and these only for wi = 1, 
n* large, we shall be obliged to make use of Pearson’s Tables of the Incomplete 
Gamma Function [14]. The logarithmic frequency curve 

m - “ p [- ® ( log “"r 5 )'] 


will be useful in approximating the z distribution. While it has been discussed 
by many authors we shall follow Pae-Tsi Yuan [23], where a full bibliography 
may be found. In our discussion we use the fa = al , fa — 04 chart of the 
Pearson system as given by S. J. Pretorius [16] (p. 147), since the logarithmic 
frequency locus connecting a\ and a 4 is already drawn in. The justification of 
this curve for fitting is due to the fact that in the fa , fa chart of the Pearson 
system as given by S. J. Pretorius [16] (p. 147), the logarithmic frequency locus 
lies in the Type VI region between the Type III locus and the Type V locus, 
and consequently closer to the Type IV region than Type III itself does. Hence 
since Type III fits fairly well under certain conditions and Type IV fits well we 
can expect the same for the logarithmic curve. Furthermore when a s is small 
the logarithmic curve is similar to Type III [23] (p. 42), and as <x 3 becomes 
larger, a 8 = 1, the difference between the two types is pronounced. However, 
it is just when a a becomes large in the region ni = 1 , n* ^ 22 that we find the 
logarithmic curves give a fine fit, since in such cases the point (<*J, fa) lies prac¬ 
tically on the logarithmic locus [16]. To fit the curve [23] (pp. 37, 48, 49), we 
find the values of the three parameters a, b, c. To find c we solve the equation 
w 8 + 3 w 2 — (4 + al. e ) = 0 for w using the table [23] (p. 48) given by Pae-Tsi 
Yuan. Knowing w we can easily solve for 

c = (log w)* , b = 

( 17 .D 

(u> + 2)<r, -1 

where the value of x must be obtained from the table of areas under the normal 
curve, if the e distribution is approximated by use of areas. 

Since the Gram-Charlier Type A series generally approximates a Pearson 
Type IV fairly well when a] is not too large, it is to be expected that the Type A 
series will approximate the e distribution in those cases when ni = nj, and also 
when al is not too large. 
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18. Levels of significance and approximation methods. We shall apply the 
results of the previous paragraphs to the determination of the value of z for 

any level of significance a, i.e. the value of z such that £ y(z) dz — l — a. 

We have such levels as the median (the 50% point of significance), the 20 %, 
5%, 1%, and . 1 % points as given in [9]. Where these tables apply there is no 
need for other methods. It would be desirable to extend the results for any 
level of significance whatever. The methods which we shall use are ( 1 ) the 
logarithmic frequency curve, ( 2 ) the Gram-Charlier Type A, and (3) the Type III 
approximation. For finding the levels of significance by the Incomplete Beta 
function, the reader is referred to [13], (p. lviii, topic (viii)). The logarithmic 
curve is very simple to use in conjunction with the table of areas under the 
normal curve. From Pae-Tsi Yuan we have 

(18.1) t = -- -A, where (e' J - l ) 1 

(e - 1 )‘ 

takes the same sign as a 8 . The value of x is obtained from the table of the 
normal curve, 1.64 for the 5% level, 2.33 for the 1% level; the value of c is 
obtained from w (17.1), and consequently the value of t (18.1). Then we have 

if z a — value of z for any level of significance, t — —-to solve for z a , where 2 , 

and a, are the values of the mean and standard deviation of z as given by the 
proper formulas in (5), ( 6 ), (7). We illustrate with examples: 

(18.2) 5% point of z, ni = oo, n 2 = 1. <*3 — 1.5351, w — 1.2264, x = 1.64, 
t — 1 . 88 ,2 = .6352, a, — 1.11, and as a result Zt% = 2.72. Fisher [9] gives 2.7693. 

We can also find Zt% easily for n\ = l,n 2 = Hereas = — 1.5351, w — 1.2264, 
x = —1.64, t = 1.197, 2 = —.6352, <r, — 1.11, Zt% — .694 compared with 
Fisher [9] z»% = .6729. 

(18.3) 1% point for ni = 4, n 2 = 8 , 2 = —.0701, a, = .4819, a*, = —.3619, 
to = 1.0144, t = 2.17 and Zi% = .976, while the accurate result is .9734. 

From experience the values of z for any level of significance obtained by the loga¬ 
rithmic frequency curve will possess an error less than 2 % of the true value of z 
for the level of significance if ni and nj are greater than twenty. It would 
seem that for other values of rii and n * the error could not be greater than 10 %, 
and usually would be much less. 


19. The Gram-Charlier Type A. We take the series in the form 

F(t) = <p(t) + At<p u \t) + A*v4 v (t), <p(t) = —/==■ 

V2t 





Some examples follow. 
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(19.1) We use the material of (18.3) and employ three terms of F(t). 2 «* 
-.0701, a, » .4819, X*. « -.0405, X*, - .0336, A, - .08032, A 4 - .02696. 

Fitting F(<) by ordinates we have t = 2.17, and consequently z «■ .976. 

(19.2) We take »i = nj = 5, 2 = 0, <r, = .4952, X* : , = 0, X 4: , = .02798, A* 0, 
A 4 = .01939. 

5% point: By ordinates < = 1.57, z,% = .777, while Fisher gives .8097. 

1% point: By ordinates t = 2.325, Zi% = 1.15, while Fisher gives 1.1974. 

(19.3) We take m = 3, nj = 20, 2 - -.15909, <r, - .5099, X* : , = -.10222, 
X 4 : . = .08822, A» = .12854, A 4 = .05438. By ordinates t = 1.523, z»% = .618, 
Fisher gives .5654. t = 1.989, zi% — .855, Fisher gives .7985. The Gram- 
Charlier Type A is recommended only for n, = n* and n», n* S 20. 


20. Type III approximation, the median, and 5% point. Since for Type III 
the median, m ., is approximately two-thirds of the distance from the mode 
to the median if a» is moderate [12], [6], then we have further assuming n i, 
n* ^ 20. 


( 20 . 1 ) 



From experience this result will furnish an accuracy with an error less than 2% 
of the true value in the range above indicated. 


(20.2) h% = 1.6437 + .2760«g - .04506a* . 


This was found by use of Salvosa’s tables and for a 8 > 1.1 by [14]. 

(20.3) z*% = <r,[1.644 + .2760a,:, - .045laL] + 2. 


We illustrate the use of (20.3) with some examples. 

(20.4) m = Ui — 1, a, = 1.5706, a, ; , = 0, z = 0, z 8 % = 2.582, 
while the accurate value is z,% = 2.5421. 

(20.5) Mi = oe, n, = 1, a, = 1.5351, 2 = .6352, <r, = 1.11, z,% ** 2.81. The 
accurate value is 2.7693. 


(20.6) Mi = n* = 5, a, = .4952, a, ;> = 0, 2 = 0, z,% = .8141, while the 
accurate value is z»% = .8097. 

(20.7) Mi = 4, nj = 8, 2 = -.0701, a. *= .4819, a, = -.3619, z*% = .6712, 
while the accurate value is .6725. 


(20.8) ni - 1, n* = 10, 2 = -.5835, a. = 1.1353, a, = -1.4333, z»% * .7283, 
while the accurate value is .8012. 

In a future paper exactly the same methods will be used for any per cent point 
of z whatever in order to compare with the results of W. G. Cochran [2]. If 
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rii and n* are large we may use the approximate formulas for c,, a,,., and I 
to obtain to the order of <r\, 

(20.9) z,% - 1.644<r, + .7760 (- - where a. - a/\(- - -Y 

\«2 nj y 2\n» »i/ 

We expand Fisher’s result [9] 

1.6449 (\ l\ 1 

e t % = —4 = + .7843 (-) by the binomial theorem, where h «■ i, to 

VW - 1 V&2 Hi/ <T g 

obtain a comparable result 

(20.10) z t % = 1.646<r, + .7843(- - - 

\nj »i 

The numerical examples given in this chapter illustrate unfavorable cases as 
well as favorable ones. 


21. The distribution of F. Historically Snedecor [19] was the first to use F 
for 6 U . We find 


( 21 . 1 ) 


P(F) 


w}" 1 n|" a _ F' n '- 1 dF 

,/»» n^j (mF + 'n 4 )* (B,+ "* ) ’ 


0 g F g 00. 


The distribution of F is J shaped if n\ ^ 2, and bell shaped for m > 2, and for 
rai > 2 one mode exists, F 0 = The two points of inflection, which 

U\\U2 + 2) 

exist for n x ^ 4, are equidistant from the mode. The moments are 




, + 2m }j r - 2m ^ 




ri2 > 2m 


n a 

nT—~2’ 


a» -.r ■ 


Tit > 2, 

2\/2(2wi + «2) 

V«i»i(ni + «j) 


/u = 


2w2(wj 71 s — 


ni(nt - 2) 2 (ns 


Z±l^ 2 (±- + ±-) 

-4) nJ’ 


The exact results for fis, fit, a>, and a t are omitted because of length. We 
have the theorem that as ni, nt —► » in any manner whatever the distribution 


of F approaches normality with mean F — 1, a r = 



The proof 


is omitted. The only type of approximating curve of any value is Type III. 
Of course the distribution of F is Type VI. No tables exist for Type VI. 
Furthermore the F distribution approaches the Type III function so slowly as 
to make most approximations of little value unless at.r ZH 1.1. Other possible 
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parameters are 6 - ^ F, and H » - , [13). Since | a** | =* 

2 | as:* | approximately we see that the distributioa of H is more skewed than 
that of 2. We mention briefly also <S* — <S| where Sf = S- *x, S* '■» 4 • 

iVi Nf 

Clearly z, F, 6, and H give equivalent levels of significance. This is not true 
for z and 5* — S\. 

a 

Finally, since F = , it may be interpreted as a quotient [5]. When the 

S 2 

moments of F do not exist, it is due to the distribution function of s\ . 

22. Conclusion. We have found the seminvariants for the z distribution, and 
approximations for them. Type III, and the logarithmic normal frequency 
functions are shown to be excellent approximations to the z distribution. The 
approach to normality for the z distribution is proved. A formula is given for 
finding the 5% level of significance for z . The F distribution is studied along 
the same lines. As far as the construction of tables for levels of significance is 
concerned, the z distribution is much easier to use. My sincerest thanks are 
due Professor C. C. Craig for his helpful guidance and many suggestions. 
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THE DOOLITTLE TECHNIQUE 

By Paul S. Dwyer 
University of Michigan 

1. Introduction* Most authors who have presented the Doolittle method, 
from Doolittle [1] down to the present, have not given any formal proof that the 
solution is valid in the general case. They usually are content with a form 
describing the various steps of a Doolittle solution. 

The author has recently shown [2] that the Doolittle method can be abbrevi¬ 
ated to a technique which is also an abbreviation, essentially, of the method of 
single division and its abbreviation which Aitken called the ‘‘Method of Pivotal 
Condensation' 1 [3]. It appears at once that the validity of the Doolittle method 
follows from the validity of the method of single division—a validity which is 
readily established. 

However one may desire a “proof” which is based directly on the Doolittle 
technique without referring to other methods of solution. It is the chief 
purpose of this paper to present such a proof. It is accomplished by the intro¬ 
duction of a notation which precisely describes the conventional Doolittle 
process and by proving that this process results in a system of equations whose 
prediagonal terms are zero. It is a secondary purpose of the paper to emphasize 
the advantages of the Abbreviated Doolittle method and to explain and illus¬ 
trate minor variations in the conventional Doolittle technique. 

2. The Abbreviated Doolittle solution. We first direct our attention to the 
essential parts of a Doolittle solution and these are the last two rows of each 
matrix of the standard Doolittle presentation. The additional rows in the 
standard presentation are rows of products which are used solely for the purpose 
of finding the two bottom rows of each matrix and they need not be recorded, 
if a computing machine is available, since the essential information is present 
in the two bottom rows. Doolittle [1] did not have calculating machines (he 
used multiplication tables) but he put the important information in Table A 
and carefully segregated the supplementary information in Table B. With 
reference to this he wrote [1] 

“It is to be observed that the numbers in Table B have but a single use while 
those in Table A are used over and over, and where the number of equations is 
large, it is of great advantage that they should be thus tabulated by themselves 
in a form compact and easy of reference.” 

For purposes of proof, as well as for purposes of calculation if a computing 
machine is available, it is only necessary to utilize the forward part of the 
Abbreviated Doolittle solution which is the equivalent of the Doolittle Table A. 

449 
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A four variable illustration of the Abbreviated Doolittle technique is presented 
in Table I. The successive equations are indicated by number, as is customary, 
and the operation which defines the equation is specified. The actual operation 
is indicated more explicitly by the notation of column 3 and this is discussed in 
the next section. 

The presentation of Table I introduces one variation from the standard Doo¬ 
little method. The division is made by the diagonal coefficient of each row 
rather than by its negative. One may still use the old technique, if he prefers, 
but it is felt that one can subtract products as easily as he can add products with 
modern machines equipped with automatic negative multiplication. In addi¬ 
tion the entries of the equivalent rows then have the same signs and, too, it is 
not necessary to take the time to change the signs of the second rows. This 
variation uses the same division method as the method of single division [2] 
and as the method of pivotal condensation [3] so that the abbreviated form of 
these methods is, essentially, the same as the abbreviated form of the Doolittle 
method. 

The application of this technique leads at each step to a coefficient for each 
variable. However if the process is to lead from our four equations in four 
unknowns, to three in three, to two in two, to one in one, it follows that all the 
entries to the left of the diagonal, which we may call prediagonal entries, must 
be zero. That this is true in the general case is the objective of the proofs of 
later sections. 

3. A notation for and description of the Doolittle technique. A main contri¬ 
bution of the present article is the use of a notation which describes the Doolittle 
technique. As long as the Doolittle process is described loosely by means of 
“operations” it is difficult to be precise in defining quantities which appear in 
the calculation, but when a notation is used which is definite enough to permit 
expansion in terms of the original coefficients, some sort of proof may be avail¬ 
able. The present notation bears some resemblance to that suggested by 
Gauss [4], though Gauss used letters to indicate the primary subscripts and 
numbers to indicate the number of secondary subscripts and his notation was 
directly applicable to the sums of least squares theory rather than to symmetric 
equations in general. 

We wish to find the solution of the equations 

n 

(1) E anXi = a„+u, j - 1, 2, • • •, n 

i-i 

where the matrix of the coefficients is symmetric. We do this by obtaining 
auxiliary equations which feature a decreasing number of variables. No serious 
restriction is made if we assume that the variables xi,xt,x», etc., are eliminated 
successively. The Doolittle technique may then be described as follows: 

We take the first equation of (1) and divide by its leading coefficient, o u , to get 
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(2) 22 bilXi - bn+ 1 . 1 , where ba — — 

<-i Ou 

and we then form 

(3) 2 = On+ 1 , 2.1 with dii.i 883 <kt —■ 0 * 1621 . 

i-i 

We then divide by a M . i and get 

(4) 53 == bn+u i with 6 * 2.1 = . 

«-l 022*1 

We next form 


( 5 ) 53 8=8 On+1,8.12 With 0*3.12 == 0*8 "" 0*i6gi — 0»2• 1 632 -1, 

*-l 

and 

n 0»8 12 

(6) 53 6*3-12#* 3=8 6 n +1,8-12 with 6*8-12 ** • 

t -1 088-12 

This process is continued so that, in general, we have 


n 


(7) 

53 - ~ a«+i,i.i2*- i-i, 3 * 1? 2, • • •, n 

t-i 

and 


(8) 

n 

53 6t7.12 -.j-1 #* = 6 „+um 2. . -j-i, j = 1,2, • • • , n 

*-1 

with 


flij-W.. 

(9) 

<•/-1 ~ 0*/ ■“ Utl6ji — — 0*8-126j8-12 — • • • 

O*, j‘_2.12 • • • j’—g6 j*, j‘—2 -12 • • • 1—8 “ 0**, j*—1.12 • • • j—26 y, y—X *12 

and 


(10) 

7 0*7.12.. 1 

0*7.13... /-1 = 

0#. 12 - - -j-i 


It is to be noted that the n equations (1) are transformed by this process to 
the n auxiliary equations of (7) or ( 8 ). The solutions of (1) are also solutions 
of these auxiliary equations since the auxiliary equations are linear combinations 
of (1). It is our purpose to show that the prediagonal coefficients of these 
auxiliary equations are always 0 so that these auxiliary equations feature a 
decreasing number of variables. 

We may use the term primary subscripts to indicate the first two subscripts 
and the term secondary subscripts to indicate the later subscripts which specify 
the order of elimination of the variables. The “order” of the coefficient is then 
equal to the number of secondary subscripts. 
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The formula (0) gives the matrix of the final Doolittle set of equations. At 
each stage of the reduction oqe can write down a formula lor all the elements 
in the matrix at that stage. Thus one can write the coefficients of order A, 
, in terms of coefficients of order less than A, 

“ an — aabji — a&.xbi *.i — ••• 

It follows at once that 


( 12 ) 


0»/,u...A ■= Otiy.ij..-A—1 — Oft.l3--‘A-lb/A’12...A~l 




Oaam--a-i 


4. Some theorems on the interchangeability of subscripts. Our main objec¬ 
tive is to prove that the prediagonal terms are zero. In order to do this we first 
prove some theorems dealing with the primary and secondary subscripts. 

Theorem 1 : The value of a*,-,...* is not changed if the primary subscripts are 
interchanged . This theorem which might be stated 4 The matrix of the coeffi¬ 
cients of a given order is symmetric” follows from the symmetry of the matrix 
of coefficients of zero order. We can show that the symmetry of the matrix 
having coefficient of order h follows at once from the symmetry of the matrix 
having coefficients of order A — 1 by comparing the value a tJ ,.. with that of 
aji....h obtained by dual substitution in (12). Since the matrix of zero order 
coefficients is symmetric by hypothesis, it follows that the matrices of the 
coefficients of order 1, 2, 3, 4, etc., are in turn symmetric. 

Theorem 2 : Any pair of consecutive secondary subscripts may be interchanged 
without changing the value of the coefficient . This theorem indicates that, within 
prescribed limits, the order of elimination does not have any effect on the result. 

Consider the coefficient having r secondary subscripts before the 

k and s secondary subscripts after the l and consider the corresponding coeffi¬ 
cient a i; ... .**... which results from an interchange of k and I. These coefficients 
can be expressed by continued use of (12) in terms of coefficients of order r + 2. 
The resulting expansion of is equivalent to that of a t with the 

interchange of the l and the k . It follows that the theorem is true if a,,....!* = 
a i} . ...« . Now a double application of (12) to a,,... leads to the expansion in 
terms of coefficients of order r (using the notation a,y. to indicate the coefficient 
of the r-th order) 


(13) 




Oij. - 


fljfe. Off t* 

a**. 



a*. 

a**, 




On. — 


d\h* 

dkk. 


a?*. 
a**. / 


Then a„... .j* is expanded similarly, the difference is formed and found to be zero. 
It follows that the theorem is true. 
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The application of Theorem 2 with the continued interchange of successive 
secondary subscripts in all possible ways leads at once to 

Theorem 3: The secondary subscripts may be interchanged in all passible ways 
without changing the value of the coefficient. This theorem might be stated “The 
value of the resulting coefficient is independent of the order of elimination.” 
This is the sort of result one would expect to find and indeed, some may feel that 
it is intuitively evident, but this formal proof is presented for those who desire 
a more rigorous approach. 

Theorem 3 enables us to prove Theorem 4 which may be stated: The value of 
0.7,11...„ is always zero if at least one of the secondary subscripts is equal to one of 
the primary subscripts. 

Suppose i is this subscript. Then by Theorem 3, i may be placed in the final 
position. Now by (12) we have 




0*7* ■ ■ • 


a.,— an. ■ ■. 

CLii *.. . 


0 . 


A similar statement holds if j appears among the secondary subscripts. 


6. The vanishing of the prediagonal entries. As an application of Theorem 4 
we can show that the prediagonal entries are identically zero and this is exactly 
what is needed to establish the validity of the forward Doolittle process. It is 
to be noted that the prediagonal entries are of form <2,7.12.. .,-1 with i < j. Then 
i must equal one of the secondary subscripts and the term is zero. 

It follows that no entries need be made to the left of the diagonal in the 
Abbreviated Doolittle solution and, indeed, no entries need be made in the 
original matrix below the main diagonal. A numerical problem is presented in 
the next section. 


6. Illustration. The Abbreviated Doolittle technique is illustrated in Table 
II. This illustration is essentially an illustration of a previous article [2] and 
serves as the basis, in a later section, for expansion into the standard Doolittle 
solution. The check is shown in the right hand column and the back solution 
is indicated. The check entries for the first matrix are obtained by adding the 
entries in the row to the main diagonal and then adding the entries in the 
column. All other check entries are obtained by adding the entries in the row. 

The solution is easily made once it is understood and results from continued 
application of formula (9). For example 

0*4 u* = 0*4 — ttsi&4i — 0M1&4S1 — 0 n.uh 4 s.it 

and this is 

On-us * .8000 - (.2000)(.6000) - (.3200)(.1905) - (.4619)(-.1612) = .6935 

(see the underscored entries of Table II). Terms of this sort are easily com¬ 
puted if a calculating machine, and especially so if one equipped with automatic 
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positive and negative multiplication, is available. The back solution too is 
easily accomplished with a machine. It is only necessary to substitute in turn 

in each of the “b" equations. Thus the value of xi is B b u . m , the Value 

044'iat 

of is bis -i 2 — £>48<is9 6 ft 8 .U 4 , that of Xs is 653.1 643 * 1654 . 12 $ — 633 - 165$.114 * 

bn-m, etc. The back solution of the check is treated similarly. 

7. A variation in technique. Before proceeding with the presentation of a 
standard Doolittle solution it seems wise to indicate another possible variation 
in the technique in addition to the division by the diagonal coefficient rather 
than its negative. It is possible to obtain the Doolittle solution by using the 
fixed entry from the first of the equivalent rows in place of using the fixed “b” 
entry and the variable “a”. This results from the fact that 

(14) a,ik....bjk.... — a,*.... bn.... (= ^ ^. 

\ a**.... / 

Thus in Table II the value an .m can be obtained with the use of 
= a n — dnbt 1 — — fl.43.n653 42 

as readily as with the use of 

Am 123 = — OnPn — fl521fe.11 — au.iib43.il • 

See the boxed entries of Table II. 

There seems to be no real choice between these techniques. The fixed “ 6 ” 
is traditional in the standard Doolittle solution while the abbreviation of the 
method of single division leads to a fixed “a”. The point to be emphasized here 
is that either the fixed “a" or the fixed “ 6 ” can be used. Also (14) is used in 
the next section in supplying details for the check portion of a standard Doo¬ 
little method. 

8. The standard Doolittle method. If no computing machine is available 
or if a more detailed solution is desired, it is preferable to record the individual 
products of (9) and thus arrive at the standard Doolittle method. (The division 
by the diagonal coefficient rather than its negative is not a fundamental differ¬ 
ence.) The standard Doolittle method, from this point of view, is an expanded 
form of the Abbreviated Doolittle method with more details added. Its validity 
then follows from the validity of the Abbreviated Doolittle method. While it 
is not true that all prediagonal terms vanish in the standard Doolittle method, 
and this fact complicates the check by row sums, yet the prediagonal a, 
(and b<j ....) are all zero. 

The standard Doolittle method is presented in Table III. Some remarks 
should be made about the non-recorded terms, the two check solutions, and the 
back solution. 

The blanks (—) indicate non zero entries which are usually not presented in a 
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Doolittle solution. They should be considered however if the first check method 
is to be used. 

The first check method, which is the logical extension of the check method of 
the Abbreviated Doolittle solution, has been outlined by Ezekial [5]. The row 
sum is the sum of all the entries in the row whether recorded or not. In order 
to check, it is necessary to add these unrecorded entries, and they are available 


TABLE II 

Abbreviated, Doolittle Solution; illustration 


Xx 

Xi 

Xt 

Xi 


Check 

1.0000 

.4000 

.5000 

.6000 

.2000 

2.7000 

— 

1.0000 

.3000 

.4000 

.4000 

2.5000 

— 

— 

1.0000 

.2000 

.6000 

2.6000 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.0000 

.40000 

.5000 

.6000 

.2000 

2.7000 

1.0000 

.40000 

.5000 

.6000 

.2000 

2.7000 


.8400 

.1000 

.1600 

.3200 

1.4200 


1.0000 

.1190 

.1905 

.3810 

1.6905 



.7381 

-.1190 

■ 4619 

1.0810 



1.0000 

-.1612 

.6258 

1.4646 




.5903 

.6935 

1.2837 




1.0000 

1.1748 

2.1747 



1.0000 


.8152 

1.8152 


1.0000 



.0602 

1.0602 

1.0000 




-.9366 

.0635 


in the columns above if we make use of formula (12). Thus, if we wish to check 
s 

the value 2 °»i&« = 1-6200, we have 

i-1 

<*u5« + + anbti + aubu + Onb*i — 

o*i + Oiibti + anbti + a*ib*i + Onbu = 

.6000 + .2400 + .3000 + .3600 + .1200 - 1.6200. 

Another check method, which is recommended by Peters and Van Voorhis [6] 
sums the entries in the row only over those columns which are to be recorded. 
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This is presented as check method 2 of Table III. As is to be expected, the check 
values of the a’s and b’a of the last two rows of each matrix are in agreement. 

It might be noted that one may use the first check method without checking 
the intermediate steps (the sums for each row) if he checks the sums for the last 
two rows of each matrix. 


TABLE III 


Doolittle solution , with checks 


Notation 

xi 

*2 

xi 

Xi 


Cheek 
Method 1 

Cheek 
Method 2 

an 

1.0000 

.4000 

.6000 

.6000 

.2000 

2.7000 

2.7000 

an 

— 

1.0000 

.3000 

.4000 

.4000 

2.5000 

2.1000 

an 

— 

— 

1.0000 

.2000 

.6000 

2.6000 

1.8000 

a i4 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.8000 

an 

1.0000 

.4000 

.6000 

.6000 

.2000 

2.7000 

2.7000 

bn 

1.0000 

.4000 

.6000 

.6000 

.2000 

2.7000 

2.7000 

an 

— 

1.0000 

.3000 

.4000 

.4000 

2.6000 

2.1000 

anbn 

— 

.1600 

.2000 

.2400 

.0800 

1.0800 

.6800 

an • i 


.8400 

.1000 

.1600 

.3200 

1.4200 

1.4200 

b,n 


1.0000 

.1190 

.1905 

.3810 

1.6906 

1.6905 

an 

— 

— 

1.0000 

.2000 

.6000 

2.6000 

1.8000 

anbn 

— 

— 

.2600 

.3000 

.1000 

1.3500 

.6600 

air i&32*i 


— 

.0119 

.0190 

.0381 

.1690 

.0690 

a*8-i2 


- 

.7381 

-.1190 

.4619 

1.0810 

1.0810 

bn- 12 


i 

1.0000 

—.1612 

.6268 

1.4646 

1.4646 

a»4 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.8000 

anbn 

— 

— 

— 

.3600 

.1200 

1.6200 

.4800 

an- ib 42-1 



— 

.0306 

.0610 

.2705 

.0914 

fl»8*12&48*12 



— 

.0192 

— .0745 

-.1743 

-.0553 

0» 4* 128 




.6903 

. 693 $ 

1.2838 

1.2839 

bii’ 128 




1.0000 

[ M 748 

2.1748 


&»M24 



1.0000 

— .1894 

.8152 

1.81532 

-.3506 

&»2*134 


1.0000 

1 .0970 

.2238 

.0602 

1.0602 

.4143 

bn'tti 

l.ooooj 

1 .0241 

.4076 

.7049 

-.9366 

.0634 

1.3049 


2ieoj _ 

. 9076 | .4241 


The back solution is carried out as in Table II. If no computing machine is 
available or if the detailed steps are desired they may be indicated as in Table 
III. The entries in the box under the x* column are respectively bu uubo it, 
bu vuba.i, and bn.mbti . Those in the preceding column are bavubn-i and 
ba iubn . The other entry is bambu . The values of the coefficients are ob¬ 
tained by subtracting these row entries from the constant term of the corre¬ 
sponding “6” equation. Thus, b a -m = (.6268) — (—.1894); ba m = 
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(.3810) -r -0970 - .2238, etc. The back solution of check method 1 agrees 
with that of check method 2. A form for accomplishing the back solution of 
the check is indicated at the right. It is not necessary to' complete the back 
solution of the check if it is not desired, and indeed, there are some who feel 
that the use of the row sum check is unnecessary with modern computing ma¬ 
chines [7]. The basic check is substitution in the original equations. 

9. Summary. The chief purpose of this paper is to show that the Doolittle 
technique actually leads to a set of equations featuring a decreasing number of 
unknowns. This is accomplished by the introduction of an appropriate notation 
to describe the process and the establishment of certain theorems which serve 
to validate the process. These theorems are of some interest aside from the 
application made here. It is a secondary purpose of this paper to emphasize 
the practicability and theoretical advantages (relative ease of calculating, theo¬ 
retically more accurate, less chance for numerical error, less recording, less time 
consuming, more compact, and more easily checked) of the Abbreviated Doo¬ 
little method and to explain and illustrate possible variations in technique in the 
forward and check (by row sums) portions of the standard Doolittle solution. 
It should be noted that the notation suggested is very useful in providing an 
easy development of various theorems used in multiple and partial correlation 
studies, the presentation of which is not the purpose of the present paper. 
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NOTES 

This section is devoted to brief research and expository articles^ notes on methodology 
and other short items. 


A PROBLEM IN ESTIMATION 

By Joseph F. Daly 
The Catholic University of America 

Several recent psychological studies in the field of memory testing [1], [2], [3] 
have suggested the following problem. Let each individual E in our popula¬ 
tion be characterized by the variates y\ • • • , y p ; y p+1 , • • • , y(p > t). Sup¬ 
pose, however, that circumstances make it impossible for us to observe the last 
t variates. For example, we may think of y l , • • • , y p as an individual's scores 
on a battery of tests, and think of y** 1 , • • • , y p+i as measures of certain psycho¬ 
logical characteristics which, though affecting the individual's performance, are 
not subject to direct observation. To make up for this, assume that we have 
a theory which tells us that if y p+ \ • • • , y p +* are held constant, then the ob¬ 
servable y ’s are dependent upon them according to a specified regression equation 

y x = (i = 1, • • • , p; M = V + 1, • ‘ • > P + 0- 

Somewhat more precisely, we assume the distribution laws 

(1) f(y\ • • •, y p+l ) = (2t)--‘ ( p+< ’ I A„ I 1 exp {-Ur.{y r . - <W - a')), 

(where r, s = 1, • • • , p + t, and repeated indices are to be summed according 
to the usual convention) and 

(2) f(y\ y p+1 , • • •,yD - (2x*Y ip exp {- JL E (y* - *;/)’}. 

The xj, are supposed to be known, but except for the conditions imposed by (1) 
and (2) nothing is known about the quantities A n , a r , and a. Having observed 
the test scores y'„ (a = 1, • • • , N) obtained by N individuals E a drawn at 
random from the population, we wish to estimate the values y* +l , • • • , y p+t 
corresponding to each E a , and the essential parameters in the distribution law 
(1), particularly the variances and covariances of y p+1 , • • • , y p+t . 

We can easily find optimum estimates of the by applying the method of 
maximum likelihood to the function (2) after substituting for the y l the scores 
y* a obtained by the individual in question. Thus if we write 

459 



460 


JOSEPH F. DALY 


v = *;*s, ik' ii - ii «vir, 

(assuming thereby that the rank of the matrix || x’ || is f) we have 

(3) ft = if'xiyi. 

These estimates are unbiased in the sense that the expected value of ft calculated 
from the distribution law (2) is ft. 

But when we come to estimate the variances and covariances involved in (1), 
the procedure is less straightforward. Under the present circumstances we 
cannot use the expression 

(4) £ w * - aw- - ft). 

for the sample covariance of ft and ft. We might, of course, try substituting 
the estimates ft from (3) for the unknown ft in (4). But this expedient will 
in general produce a biased estimate. Denoting the required covariance by 
A"" (the element in the appropriate position in the inverse of the matrix || A r , j|), 
we find as a matter of fact that the expected value of (4) when the ft are re¬ 
placed by their estimates ft is 

(5) A + fttT. 

This bias may or may not be important in any given case. But it can conceiv¬ 
ably be quite serious if the A“' are relatively small, especially if such expressions 
are employed in the usual way to estimate the correlation coefficient rather than 
the covariance. 

. Perhaps the most logical way to attack the problem is through the joint 
distribution of ft, • • • , ft alone, obtainable by integrating the undesirable 
variates ft* 1 , • • • , y p+ ‘ out of (1). We therefore consider 

(6) /(ft, • • • , ft) = (2*)~ ip | An I 1 exp { -\Au{y' - a*)(ft - ft)}, 
where 

An = Ait — Ai/JP’A,/, II B"' II = II | l_1 
Moreover, when account is taken of (2), we find that we must have 

A<, - A,, - - 3 ft - xjft 

Q O 

(Si, being Kronecker’s delta). If we now form the likelihood function 
n/(ft« . ••• , ft) from (6) for our sample, and set its derivatives with respect 

a-l 

to the ft, ft, and the B"', equal to zero, we arrive, after some simplification, at 
the equations 

ft-ft'ftrft-iZft, 


[of. (3)] 
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<U W - -j= ]£ (v« ~ ^a“)(i/i - xto')} Sij - 0, 

(7) r * < 

\A ij - ~ Z <*rt - - *£«')} - 0, 

A <y - c 2 h ij + xlA^xi, 

for determining the maximum likelihood estimates. The first of equations (7) 
is already solved for the a“, and the solution of the simultaneous equations for 
the remafhing essential parameters yields the estimates 

(8) ** = N(p l - t) § {y ' a “ X '“ ^ 

0) 1"' = i £ (ft - d")(ft - do - tr* 2 . 


A considerable amount of algebraic manipulation is required to put the solu¬ 
tions in the form given above; but since the results are about what one would 
expect in view of (5), we omit the details. As is often the case, some bias re¬ 
mains in the “optimum” estimates (9). However, this can be eliminated by 
writing N — 1 in place of N. The estimate (8) of <r 2 is unbiased as it stands. 
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CONFIDENCE LIMITS FOR AN UNKNOWN DISTRIBUTION FUNCTION 

By A. Kolmogoroff 
Moscow, U.S.S.R. 

Let xi, x %, • • • , x„ be mutually independent random variables following the 
same distribution law 

( 1 ) P\*i < *} - 

A recent paper by A. Wald and J. Wolfowitz 1 deals with the problem of using 

1 A. Wald and J. Wolfowitz, “Confidence limits for distribution functions,” Annals of 
Math. Stat., Vol. 10 (1939), pp. 106-118. 
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the observable values of the x’s to estimate the function F($). In this connec¬ 
tion it may be useful to recall the following results published by me in 1933/ 
Put 

( 2 ) F «({) = ^ 

n 

where N(£) denotes the number of those x’s whose observed values do not 
exceed £. 

Theorem 1: If the function F(£) is continuous then the distribution law of the 
quantities 

(3) D n - sup | F(Q - F n (i ) | Vn 
does not depend on F(£). 

Denote by 4>„(X) the value of the probability P{D n < X} which is common 
to all continuous distribution functions F(£). 

Theorem 2: For n tending to infinity, the distribution function $„(X) tends to 

(4) 4>(X) = g (— 

Ar"»—OO 

uniformly with respect to X. 

A more elementary proof of Theorem 2 was given by N. Smirnoff in 1939. 8 
Another paper by the same author 4 gives a table of the function 3>(X). 

Without the assumption that F(£) is continuous, we easily obtain 
Theorem 3: Whatever be the distribution function F(£), 

(5) P{D n <\}> $ n (X). 

Theorems 1 and 3 giving the exact lower bound of the probability that F n (£) 
will satisfy the inequality 

(6) | F0-) - F„({) | < 

yn 

for all values of £, can be used to establish confidence limits for F(Q corre¬ 
sponding to the confidence coefficient 

(7) a - * n (X). 

These confidence limits will be free from any restriction concerning the nature 
of the function F(£). 

* A. Kolmogoroff, “Sulla determinations empirica di una legge di distributions,” Oiomale 
delVIstituto Italiano degli Attuari t Vol. 4 (1933), pp. 83-91. 

8 N. Smirnoff, “Sur les hearts de la courbe de distribution empirique,” Recueil Math, de 
Mobcou , Vol. 6 (1939), pp. 3-26. 

4 N. Smirnoff, “On the estimation of the discrepancy between empirical curves of distri¬ 
bution for two independent samples,” Bulletin de VUniversiU de Moscou , SMe internationals 
(Math&matiques), Vol. 2, fasc. 2 (1939). 
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For sufficiently large values of n we can use the limiting distribution (4) and 
write 

(8) a « $(X). 

The following short table, based on that of Smirnoff, 4 gives the values of X 
corresponding to a few chosen confidence coefficients a. 

TABLE OF X 


a 

X 

.95 

1.35 

.98 

1.52 

.99 

1.63 

.995 

1.73 

.998 

1.86 

.999 

1.95 


Smirnoff’s paper 4 contains still another application of the function <f>(X). 
Denote by *(,*»,••• , in, an d *i, *», • • • , z„, two sequences of mutually inde¬ 
pendent random variables following the same probability law F({). Let further 
F B ,(f) and F Bj (£) be two random step functions corresponding to these series, 
defined as in (2). Smirnoff proves then the following 
Theorem 4: If the probability law F({) is continuous, then the probability 

(9) P jsup | F„,({) - F n ,(£) | < X |/«, (X) 

is independent of the function F($). If ni and n* are indefinitely increased subject 
to the restriction that the ratio ni/n* remains between two fixed numbers a\ and a* 

(10) 0 < ai < <J a* < +,°° 

then 

(11) *-,.-,(X)-»*(X). 

In the general case, where the probability law F(£) is absolutely arbitrary we have 

(12) P jsup I Fn,(f) - P.,(f) | < X < *»,.»,00- 

Owing to the above results the quantity 

(13) D„,», - sup I F ni (|) - F„,({) I 

could be used as a criterion to test the hypothesis that the probability laws of 
the two series of observable variables are actually the same. 
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CORRECTIONS TO A PAPER ON THE UNIQUENESS PROBLEM 

OF MOMENTS 


By M. G. Kendall 


London, England 

I wish to make certain corrections in my paper on “Conditions for Unique¬ 
ness in the Problem of Moments” (Annals of Math. Stat., Vol. 11 (1940), p. 402). 
I thought I had succeeded in improving on results given earlier by Stieltjes, 
L6vy and Carleman, but this is not so. 

Theorem 1 of the paper stated that a set of moments determines a distri- 

00 jT 

bution uniquely if 23 ~ converges for some real non-zero t, v r being the absolute 

r-o r ! 

moment of order r. This is true, and a similar result has been proved by L6vy, 
but my proof contained a small lacuna. It was shown that the characteristic 
function <t>(t) has a Taylor expansion which, under the conditions of the theorem, 
is convergent; but it has also to be shown that it is equal to the sum of that 
expansion. This may be seen as follows: 

We have 


e< “- 


and hence, on taking mean values, 


0(0 - 


y' (it) r Hr 
£o ' ~r! 



Since by hypothesis 


n\ 


0, 0(0 must be equal to the sum of its (convergent) 


Taylor expansion. 

The principal error was a statement that v l J n /n must either tend to a limit or 
diverge. For this reason, the second theorem should run: a distribution deter¬ 
mines a distribution uniquely if fim v\! n /n is finite (not lim v l J n /n as originally 
stated). Theorem 3 should also be restated with the upper limit substituted 
for the limit therein. 

Theorem 4 stated that a set of moments uniquely determines a distribution 


if 2 diverges. A rigorous proof is as follows: 
The characteristic function obeys the relation 


l0 <n) (OI < n> 1 

provided, of course, that v n exists. A theorem of Denjoy 1 states that if a func¬ 
tion/(x), defined in the segment (a, b ), possesses derivatives of all orders therein, 


'Arnaud Denjoy, "Bur lea fonotions quasi-analytiques de variable rfeelle,” Comptee Ren- 
due Vol. 173 (1921), p. 1899. 
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if M» is the maximum of |/ (B) (x) j in the segment and if 2 is divergent, 

then f(x) is completely determined by its value and that of its derivatives at a 
single point. 4>(t) obeys the conditions of the theorem and by taking the point 
to be t = 0, theorem 4 follows. * 

I hope that this note will correct any misunderstandings that may have arisen 
on the main paper, and I regret that a number of circumstances, not the least 
of which is war, have made it impossible to forward the correction at an 
earlier date. 


ANNOUNCEMENT CONCERNING COMPUTATION OF 
MATHEMATICAL TABLES 

In the December, 1939, issue of the Annals of Mathematical Statistics, p. 399, 
there appeared an Announcement of the Mathematical Tables Project. This 
project is operated by the Work Projects Administration of New York City, 
as 0. P. No. 265-2-97-11 under the technical supervision of Dr. A. N. Lowan. 
It is sponsored by the National Bureau of Standards, Dr. Lyman J. Briggs, 
Director. 

In order to keep the readers of the Annals up-to-date on the progress of the 
work of the Project, information will be released from time to time. 

The following list shows the status of work, as of October, 1941. The reader 
is referred to the December, 1939 issue of the Annals with respect to which n 
will denote the n th item of Tables Published, Pn will denote the n th item of 
Tables in Progress and Cn will denote the n 01 item of Tables under Consideration. 

Tables published. 1 , 2, 3, PI, P2, P3, Pi, P6(b), P6(c), P6(d), P6(e), P7, 
C7 and also 

1. Table of Five-Point Lagrangian Interpolants for arguments ranging be¬ 
tween 0 and 2 at intervals of 0.001. 

2. Tables of Grid Coordinates (American Polyconic Projection) at 5 minute 
intervals of latitude and longitude for latitude from 70°N to 28°N and for lati¬ 
tude from 49°N to 72°N. 

3. Table for Map Projections of Northwestern Extension of U. S. 

Tables in process of reproduction. P5, P6(a), P8 and Cl for [0 (.001) 7 (.01) 
50 (.1) 300 (1) 2,000 (10) 10,000; 12D] also 

1. Tables of Section Moduli and Moments of Inertia for Structural Members 
used in Naval Architecture. (For the Bureau of Marine Inspection and 
Navigation.) 

2. Tables of Si(x) and Ci(x) for x ranging from 10 to 100 at intervals of 0.001. 
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3. The zeros of the Legendre Polynomials up to the 16th order to 16 decimal 
places and the Weight Coefficients for Gauss’ Mechanical Quadrature Formula. 

Tables for which manuscripts are completed. P9, Pll, C6, (the function z", 
instead of A(x, y), has been tabulated to 16 places), and also 

1. Table of £ J 0 (t) dt from 0 to 10 at intervals of 0.01 to 10 places. 

Tables for which computations are completed. P10 (also tanh x, coth x), 
C2, C3, (change to n = —21, —20 • • • 0) and also 

1. Various hydraulic tables based on Kutter’s and Manning’s formulae. 
(Tabulation suggested by the War Department.) 

2. Table of reciprocals of the integers from 100,000 to 200,000. 

3. Table of the Associated Legendre Functions P»(z) and Q”(x) for n ranging 
between 1 and 10, and m between 0 and 4; for arguments x and ix where x 
ranges between 0 and 10 at intervals of 0.1. Also corresponding values for half¬ 
integral values of n and values of the functions for arguments in degrees. (Tabu¬ 
lation suggested by National Defense Research Committee.) 

4. Tables of R sin 6 and R cos 6. R = 1000 (10) 10,000, 6 = 5(6)800 (in 
mils). 

Tables for which computations are in progress. C3 (for n = 1, 2, • • • 20) 
and also 

1. Table of the Bessel Functions F 0 (z) and Fi(z) for the same complex argu¬ 
ments as in J«(z) and Ji(e), mentioned in P9. 

2. Tables of Length of Meridional Arc at one-minute intervals. 

3. Tables of the Confluent Hypergeometric Function for selected values of 
the parameters. 

4. Tables of three-point, four-point, six-point and seven-point Lagrangian 
Interpolants. 

5. Table of Tchebysheff Polynomials. 

Tables under consideration. C4 and also 

1. Table of the first 10 powers of the reciprocals of the integers from 1 to 1,000. 

2. Extensive tables of Elliptic Functions fof both real and imaginary 
arguments. 

3. A 12-place table of Inverse Circular and Hyperbolic Functions other than 
Arc tan x. 

4. Table of the Integral Y 0 (t) dt. 

5. Tables of the non-periodic solutions of the Mathieu Differential Equation. 

.6. Table of the Error Functions for complex arguments (suggested by Federal 

Communication Commission). 

7. Tables of the Unit-Sigma Functions and their integrals. 
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8. Tables of Circular Functions for Complex Arguments. 

9. Tables of the Zeros of the Hermite and Laguerre Polynomials and of the 
corresponding Weight Factors in Gauss’ Mechanical Quadrature Formula. 

10. Table of Lam6 Polynomials. 

11. Table of Military Grid Coordinates for certain “Control Stations.” (For 
the War Department.) 

12. Tables of the Chi-Square Distribution and “Student’s” (-distribution. 

13. Tabulation of Fisher’s A-, B-, and C- Distributions of the Multiple Correla¬ 
tion Coefficients. 

The Project would welcome suggestions for the computation of new tables of 
interest in pure and applied mathematics, as well as information regarding com¬ 
putational work in progress elsewhere. 

Communications should be addressed to Major Irving V. Huie, Administrator, 
Work Projects Administration, 70 Columbus Avenue, New York City. 

Requests for copies of published tables should be addressed to Dr. Lyman J. 
Briggs, Director of the National Bureau of Standards, Washington, D. C. 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The Fourth Summer Meeting of the Institute of Mathematical Statistics was 
held at The University of Chicago, Tuesday to Thursday, September 2 to 4, 
1941, in conjunction with the meetings of the American Mathematical Society, 
the Mathematical Association of America, and the Econometric Society. The 
following sixty-eight members of the Institute attended the meeting: 

R. L. Anderson, T. W. Anderson, K. J. Arnold, H. M. Bacon, Walter Bartky, W. D. 
Baten, A. A. Bennett, Paul Boschan, I. W. Burr, J. H. Bushey, W. E. Cederberg, W. G. 
Cochran, A. T. Craig, C. C. Craig, J. H. Curtiss, J. F. Daly, W. E. Deming, J. L. Doob, 
P. L. Dressel, P. S. Dwyer, Churchill Eisenhart, M. L. Elveback, H. P. Evans,C. H. Fischer, 
W. C. Flaherty, R. M. Foster, C. H. Graves, Louis Guttman, W. L. Hart, F. C. Hinds, 
A. S. Householder, E. V. Huntington, William Hurwitz, M. H. Ingraham, Dunham Jackson, 
Leo Katz, J. F. Kenney, L. A. Knowler, L. F. Knudsen, Tjailing Koopmans, C. F. Kossack, 
O. E. Lancaster, D. H. Leavens, B. A. Lengyel, W. G. Madow, J. N. Michie, A. M. Mood, 
J. E. Morton, Leah Naugle, Harold Nisselson, J. I. Northam, E. G. Olds, Oystein Ore, 
C. K. Payne, G. A. D. Preinreich, Francis Regan, Selby Robinson, C. F. Itoos, M. M. 
Sandomire, Max Sasuly, Henry Scheffe, H. M. Schwartz, Harry Siller, J. H. Smith, M. E. 
Wescott, S. S. Wilks, E. W. Wilson, Gale Young. 

The opening session, on Tuesday morning, was devoted to contributed papers 
on Probability and Statistics and was held jointly with the American Mathe¬ 
matical Society and the Econometric Society. The Chairman was Professor 
A. T. Craig, University of Iowa, and the following papers were presented: 

1. A geometric derivation of Fisher’s z-transformation. 

J. B. Coleman, University of South Carolina. 

2. Large sample distribution of the likelihood ratio. 

Abraham Wald, Columbia University. 

3. On the integral equation of renewal theory. 

(Read by title.) 

Willy Feller, Brown University. 

4. Cumulative frequency functions. 

Irving Burr, Purdue University. 

5. On spherical probability distributions. 

K. J. Arnold, Massachusetts Institute of Technology. 

6. Some observations on analysis of variance theory. 

(Read by title.) 

Hilda Geiringer, Bryn Mawr College. 

7. On the asymptotic distribution of medians of samples from a multivariate population . 

A. M. Mood, University of Texas. 

8. A problem of estimation. 

J. F. Daly, Catholic University. 

Abstracts of these papers follow this report. 

On Tuesday afternoon a session was held jointly with the Econometric Society 
on Time Series Analysis. Under the chairmanship of Professor C. C. Craig of 
the University of Michigan, the following papers were presented: 



REPORT OF CHICAGO MEETING 


460 


1.1e sampling theory applicable to economic time eeriest 

Tjalling Koopmans, Pena Mutual Life Insurance Co., Philadelphia. 

2. Serial correlation. 

R. L. Anderson, North Carolina State College. 

The morning session on Wednesday was held jointly with the Econometric 
Society on Curve Fitting. The chair was held by Dr. J. Marschak of the New 
School for Social Research and the following papers were presented: 

1. Weights to compensate for transformation in curve fitting. 

T. 0. Yntema, University of Chicago and Cowles Commission. 

2. Curve fitting by cumulative addition. 

John H. Smith, University of Chicago and Cowles Commission. 

On Wednesday afternoon, Professor S. S. Wilks of Princeton University acted 
as chairman of a session on Multivariate Analysis. The following papers were 
read: 

1. On testing sets of means and discriminant analysis. 

Abraham Wald, Columbia University. 

2. On tests of hypotheses concerning variances and covariances. 

William G. Madow, Bureau of the Census. 

The Josiah Willard Gibbs Lecture of the American Mathematical Society was 
delivered on Wednesday evening by Professor Sewall Wright of the University 
of Chicago. His topic was Statistical Genetics and Evolution. 

On Thursday morning a joint session on Demand and Supply Analysis was 
held with the Econometric Society. At this session Dr. C. F. Roos of the In¬ 
stitute of Applied Econometrics presided, and the following papers were 
presented: 

1. Demand analysis for certain commodities based on income and budget data. 

J. Marschak, New School for Social Research, and George Garvey, National Bureau 

of Economic Research. 

2. Derivation of elasticities of demand and supply: A direct mgthod. 

Oscar Lange, University of Chicago and Cowles Commission. 

3. On the workings of a general equilibrium system. * 

J. L. Mosak, University of Chicago and Cowles Commission. 

An informal reception was held on Monday evening in the Judson Court 
Lounge. On Tuesday and Wednesday afternoons the ladies of the Mathematics 
Department of the University of Chicago served tea in the Eckhart Hall Common 
Room. After the joint session on Tuesday afternoon, the Cowles Commission 
for Research in Economics gave a tea in the Common Room of the Science 
Building. On Thursday evening a joint dinner of the four mathematical organi¬ 
zations was held in Hutchinson Commons, preceded by an informal reception 
at the Reynolds Club. 

Edwin G. Olds, 

Secretary 



ABSTRACTS OF PAPERS 

(Presented on September 2,1941, at the Chicago Meeting of the Institute) 


A Geometric Derivation of Fisher’s z-transformation. J. B. Coleman, Uni¬ 
versity of South Carolina. 

In fitting points in a plane by a line so that the sum of the squares of the perpendicular 
deviations shaU be a minimum, a second line is found for which the sum of the squares of 
the deviations is a maximum. Let 2d 1 be the sum of the squares of the deviations of the 
points from the minimum line, and 2D* be the sum of the squares from the maximum line. 
Then 2D*/2d? - (1 + r)/( 1 — r). i log (1 r)/(l — r) is Fisher’s ^-transformation for test¬ 

ing the coefficient of correlation. 


Large Sample Distribution of the Likelihood Ratio. Abraham Wald, Columbia 
University. 

The large sample distribution of the likelihood ratio has been derived by 8. S. Wilks 
(Annals of Math. Stat ., Vol. 9 (1938)) in case of a linear composite hypothesis and under 
the assumption that the hypothesis to be tested is true. Here a general composite hy¬ 
pothesis is considered and the distribution in question is derived also in case that the 
hypothesis to be tested is not true. Let/(a?i , ■ • • , z p , 0j , • • • , 0*) be the joint probability 
density function of the variates Z\ , • • ■ , z p involving k unknown parameters 0i , • • • , 0*. 
Denote by H u the hypothesis that the true parameter point 0 ■■ (0i , • • • , 0*) satisfies the 
equations (0) — • • * ■■ &(0) *■ 0, (r < k). Denote by X« the likelihood ratio statistic for 
testing H m on the basis of n independent observations on Xi , • • • , z v . For any parameter 


0fc(0) 

point 0 let £,/(0) ■» ~r— 
00/ 

d log/(*l , ■ ■ 'ftp yV 


and let c*-/(0) be the expected value of 


3 log f(z\ 


>3p>0) 


00 , 


00 / 


calculated under the assumption that 0 is the true parameter point. 


For any 0 denote by A(6) the matrix || &/(0) || (i m 1, ... ,r;jf - 1, , k) and let 

II <*/(0) II - II *j(0) II" 1 , (t, j - 1, ••• , k). Let furthermore || aj,(0) ||, (u, v - 1, ••• , r) 
be the matrix equal to the product 4(0)* || «ri,(0) ||-^(0), where A(B) is the transpose of 
A(0). Finally let || c!,(0) || — || II" 1 , (w, v - 1, ••• , r). For each n and 0 denote 

by Vm(0), • • • , Vmifi) a set of r variates which have a joint normal distribution with mean 
values y/n(i(9), ••• , <\/ntr(0) and covariance matrix [\ <r! v (0) ||, (u, v — 1 , ••• , r). De- 

It has been shown that under 


note the quadratic form 
certain assumptions on f(z\ 


yun{0)yvn(0)etv(0) by Q»(0). 

• , Zp , 0), €x(0), , £r(0) we have lim {JP(—2 log X» < 


t | 0) — P[Qn(0) < t \ 0]} — 0 uniformly in t and 0, where for any * P(z < t | 0) denotes the 
probability that z < t holds under the assumption that 0 is the true parameter point. The 
distribution of Q n (0) is known and has been treated in the literature. If H» is true, then 
£i(0) ■»•••■■ &(0) “ 0, and Q n (0) has the x* distribution with r degrees of freedom. 

On the Integral Equation of Renewal Theory. W. Feller, Brown University. 

As is well-known, the equation U(t) — Gif) + J £ U(t — z) dF(x) has frequently been 
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discussed, under different forms, in oonnectionwith the population theory , the theory of 
industrial replacement, etc. In the present paper it Is shown that* using Tauberian 
theorems for Laplace integrals, it becomes possible to analyse in detail the asymptotic 
behavior of C7(0 as t -> « and also to solve some other problems which have been discussed 
in the literature. Strict conditions for the validity of different methods to treat the equa¬ 
tion are given together with some modifications found to be necessary. The paper will 
appear in the AnnaU of Mathematical Statistics. * 

Cumulative Frequency Functions. I. W. Burr, Purdue University. 

Frequency and probability functions play a fundamental role in statistical theory and 
practice. TTiey are, however, often inconvenient and difficult to use, since it is necessary 
to integrate or sum to find the probability for a given range. Theoretically the cumulative 
or integral frequency function would seem to be better adapted to determining such prob¬ 
abilities, since the latter can be found simply by a subtraction. The aim of this paper is 
to make a contribution toward the direct use of cumulative frequency functions. Some 
general properties and theory of cumulative functions are presented with particular empha¬ 
sis upon certain moment functions adapted to such direct use. Both oontinuous and dis¬ 
crete cases are included. A list of possible cumulative functions is given and a particular 
one, F(x) — 1 — (1 -f discussed fully. This function has properties which make it 

practicable and adaptable to a wide variety of distribution types. It well illustrates the 
possibilities of the cumulative approach. 

On Spherical Probability Distributions. Kenneth J. Arnold, Massachusetts 
Institute of Technology. 

Two methods of correspondence for circular distributions to the normal error function 
have led to non-constant absolutely continuous functions [See F. Zernike’s article in Hand - 
buck der Phyeik Vol. 3, pp. 477-478]. The corresponding distributions for the sphere are 
found. The case of diametrical symmetry for both circle and sphere is discussed. Tables 
of the probability integrals involved are given and an application in geology is included. 

Some Observations on Analysis of Variance Theory. Hilda Geiringek, 
Bryn Mawr College. 

The test functions used in analysis of variance present themselves in different classes 
of important problems. Their distribution has been determined and tabulated by R. A. 
Fisher 1 under the hypothesis that the chance variables are all independent of each other and 
subject to the same normal law. Consequently we can in this way test only the hypothesis 
that the theoretical populations have all these properties. 

If it is not possible to determine the exact distribution of test functions under sufficiently 
general assumptions regarding the populations we may: (a) find an asymptotic solution of 
the problem, i.e. determine the distribution of the test functions for large samples .* Or (b) 
determine at least the mathematical expectations and the variances of the test functions 
for appropriately general populations and for small samples. 

It is well known that the expectations of the two quadratic forms which are basic in the 
analysis of variance are equal , even if the n populations are not normal but equal to each 
other (Bernoulli series). But, in addition, we can prove the mathematical theorem that, 
under the same conditions the expectation of their quotient equals one . The next step con¬ 
sists in studying the ease that the n distributions are not equal to each other and to investi¬ 
gate certain inequalities characteristic for the Lexis Series and Poisson Series. These 
different criteria are completed by the computation of the variances of the test functions. 

» “MetronVol. 5 (1926), p. 90-104. 

1 See e.g. W. G. Madow, Annals of Math , Skit., Vol. 11 <1940), p. 193. 
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In addition to the above mentioned test functions known as “variance within” and 
“variance among” classes other symmetrical test functions have been considered in the 
classical analysis of variance. Here again we may assume quite general populations . It 
results that the Lexis as well as the Poisson Series may now be characterized by equalities 
(instead of inequalities). 

Finally it seems to be worthwhile to omit the assumption of independent chance variables 
and to study different kinds of mutual dependence. These investigations lead to new in¬ 
structive inequalities among the expectations. These last considerations seem to be con¬ 
nected with Fisher’s “intraclass correlation” and to supplement this idea. 

On the Asymptotic Distribution of Medians of Samples from a Multivariate 
Population. A. M. Mood, University of Texas. 

Let two variates x\ and x 2 have a density function f(xi , x 2 ) which, besides being positive 
or zero and having its integral over the whole space equal to one, shall satisfy these con¬ 
ditions: 


Lj( Xi ’n) dx '~ £/(*>-0)<te, + o(i) 

L / (n’ X, ) dX, ~L? ( °’ X,)dX + 0 (n) 


The coordinate system is assumed to have been chosen so that the population median is at 
the origin. Let (i i , it) be the median of a sample of 2n -f 1 elements drawn from a popula¬ 
tion with this density function. It is shown that for large samples (xi , x 2 ) is normally 
distributed to within terms of order 1 /y/n with zero means and variances and covariances 
given by certain integrals of f(x\ , x 2 ). 

A similar result is true for k as well as two variates. 


A Problem in Estimation, Joseph F. Daly, The Catholic University of America. 

Consider a normal population in which each individual is characterized by the variates 
yi , • • * , y p , yp+\ , y P + 2 . Suppose that the latter two are not directly observable, but that 
for given values of y p+ i , y p+2 the first set of y’a is independently distributed about the 
“regression line” y* - y p +1 + ky p + 2 (k - 1, • • • , p) with a common variance a 2 . For each 
individual, one can thus determine values , y p+i from the observed y i , • ■ • , y p , using 
the method of least squares. Assuming a similar relation between the expected values of 
y\ , • • • , y P +t in the original population, these estimates fi p+ i , fo +2 are, of course, unbiased. 
However, if we calculate these ^’s for each individual of a sample of N, and substitute them 
in the Pearson product-moment correlation formula, the estimate of the correlation be¬ 
tween i/p+i and j/p +2 thus obtained is somewhat biased. The bias depends on the number of 
observable y’s, and on the size of the variances and covariances of y^+i , y p + 2 relative to cr*. 

Is Sampling Theory Applicable to Economic Time Series? T. J. Koopmans, 
Penn Mutual Life Insurance Company. 

The classical regression theory assumes that the values of the independent variables 
remain the same in repeated samples. Certain situations in economic analysis, like price 
formation according to the “cobweb” theorem, require a sampling theory of serial regression 
in which certain observations may represent a dependent variable at one time and an inde¬ 
pendent variable at a later time. This leads to the problem of the joint distribution of 
certain quadratic forms in normal variables. 

The simplest problem of this type is that of the distribution of the ratio r «■ q/p of a 
quadratic form qinT observations from a normal distribution with mean 0 to the sum p 
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of the squarea of these observations. The distribution of r is independent of that of p 
and is 


Mr) 


1 f 

2wi J y 


(s-r)* M 



dz f 


where the h are the characteristic values of q, while the path of integration y proceeds 
from r through the lower half of the complex plane to a point on the real axis exceeding any 
kt and from there returns to r through the upper half-plane. 

In testing for the presence or absence of serial correlation (or regression) q is the sum of 
products of successive observations, and kt m v* oos |ir(/(3T + 1)). Replacing this set of 
discrete values in the above integral by a continuous variable of similar distribution, the 
following approximation to the distribution of r is found: 

T 

h*(r) m -— -2 ir f 2 (sin <t> - r) iT_2 -sin ( 71 - ^Ycos* 

» Jarcin, \ 4 2 / 



CONSTITUTION 

OF THS 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, and Committee on 

Publications 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer, elected for a term of one year by a majority ballot at the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the FellowB one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be called from 
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time to time by the Board of Directors and shall be called at any time by tlie ^President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques¬ 
tion at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties of the Officers, Board of Directors, Committee on Membership, and 

Committee on Publications 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vioe-Presidents, a Fellow selected by vote of the Fellows present, 
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shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti¬ 
tute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti¬ 
tute, and. of all books, pamphlets, manuscripts and other literary or scientific material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


ARTICLE II 
Dues 

1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

- 3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
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may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer Bhall report the delinquent one to the Board of Directors, 
by whom the person's name may be stricken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 
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Bethlehem. Passano. 

Bryn Mawr. Geiringer. 

Lewi8burg. Benson, Richardson. 
Millersville. Boyer. 

New Kensington. Johner. 

Oakmont. Petrie. 

Overbrook Hills. Watson. 
Philadelphia. Koopmans, Mauchly, 
Shohat. 

Pittsburgh. Blackburn, Calkins, Elkins, 
Hebley, McDiarraid, Netzer, Niver, 
Olds, Savulak. 

State College. Graves, E. Johnson, 
Wagner. 

Philippine Islands. (3) 

Manila. Jaramillo, Mills, Toralballa. 

Rhode Island. (2) 
Providence. Bennett, Feller. 

South Carolina. (2) 

Clemson. Upholt. 

Columbia. J. B. Coleman. 

Texas. (7) 

Austin. Dodd, Mood, Vickery, Villavaso. 
Dallas. Mouzon. 

Lubbock. Michie. 

Waco. Perry. 

Utah. (1) 

Salt Lake City. Woodbury. 



DIRECTORY 07 INOTTlTOTB 493 


Virginia. (11) 

Arlington. Bingham, Caine, Schultz 
Shelton, Simmons, Thom. 

Dahlgren. Dresch. 

Lexington. Royston. 

Lynchburg. Risley. 

Staunton. Owen. 

Vienna. Brandt. 

FOREIGN 

Argentina. (2) 

B anfield . Acerboni. 

Buenos Aires. Barral-Souto. 

Brazil. (1) 

Rio de Janeiro. Kingston. 

Canada. (6) 

Chatham, Ontario. Beall. 

Edmonton, Alberta. Keeping. 

Kingston, Ontario. Edgett. 


Vermont. (1) 

Bennington. Lundberg. 

Washington. (2) 

Pullman. Vatnsdal. 

Woodinville. Anthony. 

Wisconsin. (7) 

Madison. Beal, Eisenhart, H. P. Evans, 
Fox, Ingraham, Ozanne. 

Milwaukee. Kenney. 

MEMBERS 

Toronto, Ontario. De Lury, R. W. B. 
Jackson, Wolfenden. 

m 

China. (3) w 

Shanghai. Chang, Shen, Wei. 

England. (2) 

Ilfracombe, Devon. Perryman. 
Manchester. Robs. 

Ireland. (1) 

Cork. M. D. McCarthy. 
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